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Preface 



This volume contains the notes of the lectures given at the Fourth Interna- 
tional School on Advanced Functional Programming, held August 19-24, 2002, at 
St. Anne’s College in Oxford, UK. 

This School was preceded by earlier ones in Bastad (1995, Sweden, LNCS 
925), Olympia, WA (1996, USA, LNCS 1129), and Braga (1998, Portugal, LNCS 
1608). The goal of this series of schools is to make recent developments in the 
area of functional programming widely available. The notes are published to 
enable individuals, small groups of students, and lecturers to study recent work 
in the rapidly developing area of functional programming. 

The lectures in this School introduce tools, language features, domain-specific 
languages, problem domains, or programming methods. All lectures are ac- 
companied by software, and all lectures contain exercises and practical assign- 
ments. Most of the resources can be downloaded via the website of the School: 
http : //www. functional-programming. org/afp/ afp4/. 



The Lectures 

Richard Bird and Jeremy Gibbons show how to construct a program for arith- 
metic coding. They use the theory of folds and unfolds to develop both a program 
for arithmetic coding and decoding, and a proof of correctness. The elegant result 
shows that using theory can make a difficult proof digestible. 

Manuel Chakravarty and Gabriele Keller tackle the performance problem of 
Haskell’s standard arrays. They introduce an array library with which array- 
centric algorithms can be coded elegantly, which has very good performance. 

Koen Claessen and Colin Runciman show how to use QuickCheck to specify 
program properties and to test these properties on functional programs, and how 
to use Hat to trace computations. The combination of the two tools which can 
be used to trace computations of unexpected results is a powerful debugging 
tool. 

Matthias Felleisen explains how to develop interactive Web programs in func- 
tional Scheme with continuations, using DrScheme and its built-in Web server. 
The support from a continuation mechanism makes these programs safe for back- 
tracking and cloning. In the exercises he shows how to construct a registration 
site for a school like the Advanced Functional Programming School. 

Cedric Fournet and Fabrice Le Fessant give an overview of concurrent, dis- 
tributed, and mobile programming with JoCaml. JoCaml is an extension of the 
Objective Caml language with support for lightweight concurrency and synchro- 
nization, the distributed execution of programs, and the dynamic relocation of 
active program fragments during execution. 

Paul Hudak discusses the domain-specific language Functional Reactive Pro- 
gramming (FRP) for programming reactive hybrid systems such as robots. The 



VI 
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key ideas in FRP are its notions of behaviors (continuous, time-varying values) 
and events (time-ordered sequences of discrete values). The School concluded 
with a soccer match between two soccer team simulators. Both teams actually 
managed to score a goal (and to run around in circles in faraway corners) . 

Finally, Phil Wadler introduces XQuery, a query language for XML designed 
by the World Wide Web Consortium, the standards body responsible for HTML 
and XML. Like SQL and OQL, XQuery is a functional language, and it has a 
type system based on XML Schema. 



Acknowledgements 

We want to thank everyone who made this School a successful and pleasant 
event. The lecturers, all world-class researchers in functional programming, care- 
fully prepared their lecture notes, exercises, and practical assignments, and gave 
inspiring lectures. The students actively participated in the School, and chal- 
lenged the lecturers. Jane Ellory and Jeremy Gibbons of Oxford University took 
care of the many details of the local organization. The Oxford Computing Lab 
took care of the computers, and the people of St. Anne’s took care of the rest 
of our needs. Microsoft sponsored the School. ICS, Utrecht University took the 
financial risk of the School. 



February 2003 



Johan Jeuring 
Simon Peyton Jones 
Organizers AFP 2002 



Table of Contents 



Arithmetic Coding with Folds and Unfolds 1 

Richard Bird and Jeremy Gibbons 

An Approach to Fast Arrays in Haskell 27 

Manuel M. T. Chakravarty and Gabriele Keller 

Testing and Tracing Lazy Functional Programs 

Using QuickCheck and Hat 59 

Koen Glaessen, Golin Runciman, Olaf Ghitil, John Hughes, 
and Malcolm Wallace 

Developing Interactive Web Programs 100 

Matthias Felleisen 

JoCaml: A Language for Concurrent Distributed 

and Mobile Programming 129 

Gedric Fournet, Fabrice Le Fessant, Luc Maranget, and Alan Schmitt 

Arrows, Robots, and Functional Reactive Programming 159 

Paul Hudak, Antony Gourtney, Henrik Nilsson, and John Peterson 

XQuery: A Typed Functional Language for Querying XML 188 

Philip Wadler 



Author Index 



213 



Table of Contents 



Arithmetic Coding with Folds and Unfolds 1 

Richard Bird and Jeremy Gibbons 

An Approach to Fast Arrays in Haskell 27 

Manuel M. T. Chakravarty and Gabriele Keller 

Testing and Tracing Lazy Functional Programs 

Using QuickCheck and Hat 59 

Koen Glaessen, Golin Runciman, Olaf Ghitil, John Hughes, 
and Malcolm Wallace 

Developing Interactive Web Programs 100 

Matthias Felleisen 

JoCaml: A Language for Concurrent Distributed 

and Mobile Programming 129 

Gedric Fournet, Fabrice Le Fessant, Luc Maranget, and Alan Schmitt 

Arrows, Robots, and Functional Reactive Programming 159 

Paul Hudak, Antony Gourtney, Henrik Nilsson, and John Peterson 

XQuery: A Typed Functional Language for Querying XML 188 

Philip Wadler 



Author Index 



213 



Arithmetic coding with folds and unfolds 



Richard Bird and Jeremy Gibbons 

Programming Research Group, Oxford University 
Wolfson Building, Parks Road, Oxford, 0X1 3QD, UK 



1 Introduction 

Arithmetic coding is a method for data compression. Although the idea was 
developed in the 1970’s, it wasn’t until the publication of an “accessible imple- 
mentation” [14] that it achieved the popularity it has today. Over the past ten 
years arithmetic coding has been refined and its advantages and disadvantages 
over rival compression schemes, particularly Huffman [9] and Shannon-Fano [5] 
coding, have been elucidated. Arithmetic coding produces a theoretically optimal 
compression under much weaker assumptions than Huffman and Shannon-Fano, 
and can compress within one bit of the limit imposed by Shannon’s Noiseless 
Coding Theorem [13]. Additionally, arithmetic coding is well suited to adaptive 
coding schemes, both character and word based. For recent perspectives on the 
subject, see [10, 12]. 

The “accessible implementation” of [14] consisted of a 300 line C program, 
and much of the paper was a blow-by-blow description of the workings of the 
code. There was little in the way of proof of why the various steps in the process 
were correct, particularly when it came to the specification of precisely what 
problem the implementation solved, and the details of why the inverse operation 
of decoding was correct. This reluctance to commit to specifications and correct- 
ness proofs seems to be a common feature of most papers devoted to the topic. 
Perhaps this is not surprising, because the plain fact is that arithmetic coding is 
tricky. Nevertheless, our aim in these lectures is to provide a formal derivation 
of basic algorithms for coding and decoding. 

Our development of arithmetic coding makes heavy use of the algebraic laws 
of folds and unfolds. Although much of the general theory of folds and unfolds is 
well-known, see [3, 6], we will need one or two novel results. One concerns a new 
pattern of computation, which we call streaming. In streaming, elements of an 
output list are produced as soon as they are determined. This may sound like 
lazy evaluation but it is actually quite different. 

2 Arithmetic coding, informally 

Arithmetic coding is simple in theory but, as we said above, tricky to implement 
in practice. The basic idea is to: 

1 . Break the source message into symbols, where a symbol is some logical group- 
ing of characters (or perhaps just a single character). 
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2. Associate each distinct symbol with a semi-open interval of the unit interval 

[ 0 .. 1 ). ^ 

3. Successively narrow the unit interval by an amount determined by the in- 
terval associated with each symbol in the message. 

4. Represent the final interval by choosing some fraction within it. 



We can capture the basic datatypes and operations in Haskell by defining 



type Fraction 
type Interval 

unit 

unit 

within 

within X (Z, r) 
pick 

pick {I, r) 



Ratio Integer 
(Fraction^ Fraction) 

Interval 

( 0 , 1 ) 

Fraction —f Interval Bool 
I < X A X < r 
Interval Fraction 
(Z+r)/2 



Except where otherwise stated, we assume throughout that 0 < / < r < 1 for 
every (Z, r) :: Interval, so all intervals are subintervals of the unit interval. The 
code above gives a concrete implementation of pick, but all we really require is 
that 



pick int within int 

(We use underlining to turn a prefix function into an infix binary operator; this 
would be written ‘within' in Haskell.) 



2.1 Narrowing 

The operation of narrowing takes two intervals i and j and returns a subinterval 
k of i such that k is in the same relationship to i as j is to the unit interval: 

(i>) :: Interval Interval Interval 

(Z, r) > {p, q) = {l+ (r-l) xp, I + (r-l) x q) 

Diagrammatically, we have: 

111 



r " ' - 



I - 






l+(r-l)xq 
l+{r—l) xp 



0 0 0 
Exercise 1. Prove that x within {inti > int 2 ) => x within inti. 

Exercise 2. Show that c> is associative with identity unit. Is > commutative? 
Exercise 3. Define an inverse <i (‘widen’) of > such that {inti >int 2 ) <sinti = int 2 . 
Exercise J). Define the notion of the reciprocal i~^ of an interval i, such that 



(The reciprocal of a sub-unit interval will in general not itself be a sub-unit.) 
Redefine widening in terms of narrowing and reciprocal. 



Arithmetic coding with folds and unfolds 



3 



2.2 Models 

In order to encode a message, each symbol has to be associated with a given in- 
terval. For our purposes, Model is an abstract type representing a finite mapping 
from Symbols to Intervals with associated functions: 

encodeSym :: Model —f Symbol Interval 
decodeSym :: Model —f Fraction Symbol 

We assume that the intervals associated with symbols do not overlap: for any 
m :: Model and x :: Fraction, 

s = decodeSym m x = x within [encodeSym m s) 

Rather than having a single fixed model for the whole message, we allow the 
possibility that the model can change as the message is read; such a scheme is 
called adaptive. For instance, one can begin with a simple model in which symbols 
are associated via some standard mapping with intervals of the same size, and 
then let the model adapt depending on the actual symbols read. Therefore we 
also assume the existence of a function 

newModel :: Model — > Symbol Model 

As long as the decoder performs the same adaptations as the message is recon- 
structed, the message can be retrieved. Crucially, there is no need to transmit 
the model with the message. The idea of an adaptive model is not just a useful re- 
finement on the basic scheme, but also an essential component in the derivation 
of the final algorithms. 

Exercise 5. Specify the stronger condition that the intervals associated with 
symbols partition the unit interval. 



2.3 Encoding 



Having defined the relevant datatypes and auxiliary operations we can now define 
arithmetic encoding, which is to compute encode^ m unit, where 



encodeo 
encodcQ m int 

encodeSyms 
encodeSyms m ss 
nextint 

nextint (m, [ ]) 
nextint [m,s : ss) 



:: Model — *■ Interval [Symbol] — > Fraction 

= pick ■ foldl (>) int ■ encodeSyms m 
:: Model [Symbol] [Interval] 

= unfoldr nextint [m, ss) 

:: [Model, [Symbol]) 

Maybe [Interval, [Model, [Symbol])) 

= Nothing 

= Just [encodeSym m s, [newModel m s, ss)) 



The function encodeSyms m uses the initial model m to encode the symbols 
of the message as intervals. These intervals are then used to narrow the unit 
interval to some final interval from which some number is chosen. The code 
makes use of the standard Haskell higher-order operators foldl and unfoldr, 
which are discussed in more detail in the following section. 
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2.4 Decoding 

What remains is the question of how to perform the inverse operation of arith- 
metic decoding. Rather than give a program, we will give a non-executable spec- 
ification. The function decode^ :: Model Interval Fraetion —f [Symbol] is 
specified by 

ss beg ins {deeodeo m int (eneodeo m int ss)) 

for all ss, where xs beg ins ys if ys = xs -H- xs' for some xs'. Thus deeodeo is 
inverse to eneodeo in the sense that it is required to produce the sequence of 
symbols that eneodeo encodes but is not required to stop after producing them. 
Termination is handled separately. Provided we record the number of symbols 
in the message, or ensure that it ends with a special end-of-message symbol that 
occurs nowhere else, we can stop the decoding process at the right point. 

Exercise 6. The Haskell definition of begins :: Eq a [a] —f [a] —f Bool is 

[ ] beg ins ys = True 

{x : xs) begins [] = Ealse 

{x \ xs) begins {y : ys) = {x == y A xs begins ys) 

What is the value of [ ] begins T?. 

Exercise 7. What are the advantages and disadvantages of the two schemes (re- 
turning the length of the message, or making use of a special end-of-message 
symbol) for determining when to stop decoding? 

2.5 Remaining refinements 

Simple though encode^ is, it will not suffice in a viable implementation and this 
is where the complexities of arithmetic coding begin to emerge. Specifically: 

— we really want an encoding function that returns a list of bits (or bytes) 
rather than a number, not least because — 

— for efficiency both in time and space, encoding should produce bits as soon 
as they are known (this is known as incremental transmission, or streaming); 

— consequently, decoding should be implemented as a function that consumes 
bits and produces symbols, again in as incremental a manner as possible; 

~ for efficiency both in time and space, we should replace computations on 
fractions (pairs of arbitrary precision integers) with computations on fixed- 
precision integers, accepting that the consequent loss of accuracy will degrade 
the effectiveness of compression; 

~ we have to choose a suitable representation of models. 

All of the above, except the last, will be addressed in what follows. We warn the 
reader now that there is a lot of arithmetic in arithmetic coding, not just the 
arithmetic of numbers, but also of folds and unfolds. 
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3 Folds and unfolds 

Let us now digress a little to recall some of the theory of folds and unfolds. We 
will return to and augment our understanding of these operators in subsequent 
sections. 

The higher-order operator foldl iterates over a list, from left to right: 

foldl :: {P a P) P [a] P 

foldl f e [] = e 

foldl f e {x : xs) = foldl f {f e x) xs 

Thus, writing / as an infix operator 0, we have 

foldl (0) e [x, y, z] = ((e 0 i) 0 y) 0 z 

Dually, the higher-order operator foldr iterates over a list, from right to left: 

foldr :: (a ^ P ^ P) ^ P ^ [a] ^ P 

foldr f e [] = e 

foldr f e {x : xs) = f x {foldr f e xs) 

Thus, foldr (0) e [x, y,z\ = a; 0 (y 0 (z 0 e)). The crucial fact about foldr is the 
following universal property, for a strict function h we have 

h = foldr f e = h [] = e A h {x : xs) = f x {h xs) 

There is a close relationship between foldl and foldr, captured in part by the 
following two theorems. As the names of the theorems suggest, we are not telling 
the whole story here. 

Theorem 8 (First Duality Theorem [3]). If f is assoeiative with unit e, 
then foldl f e xs = foldr f e xs for all finite lists xs. 

Theorem 9 (Third Homomorphism Theorem [7]). If both h = foldl fi e 
and h = foldr /2 e, then there is an associative f with unit e such that h = 
foldr f e. 

From Theorem 8 and Exercise 2, we have 
foldl (>) unit = foldr (>) unit 

So why don’t we use the arguably more familiar foldr to express arithmetic 
coding? The answer lies in the the following lemma, which turns out to be an 
essential step in obtaining a program for decoding: 

Lemma 10. 

foldl (>) int ■ encodeSyms m = snd - foldl step {m,int) 

where 

step {m,int) s = {newModel m s,int\> encodeSym m s) 
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This lemma shows how two computations, namely turning the sequence of sym- 
bols into a sequence of intervals and then combining that sequence of intervals 
into a single interval, can be fused into one. Fusion is perhaps the single most im- 
portant general idea for achieving efficient computations. There is no equivalent 
lemma if we replace foldl by foldr. 

Exercise 11. Using the universal property, prove the fusion theorem for foldr: 
provided /i is a strict function, h e = e' and h {f x z) = f x (h x) for every x 
and z, we have h ■ foldr f e = foldr f e' . 

Exercise 12. By defining map as an instance of foldr, prove map fusion: 
foldr f e ■ map g = foldr {f ■ g) e 

Exercise 13. Why don’t the universal property and the fusion theorem for foldr 
hold for non-strict hi Does the First Duality Theorem hold for infinite or partial 
lists? 

Exercise I 4 . Suppose that {x (B y) Q x = y for all x and y. Prove that 
foldl ( 0 ) (foldr ( 0 ) X ys) ys = x 
for all X and finite lists ys. 

Exercise 15. ‘Parallel loops’ may also be fused into one: if 
h xs = (foldr fi Cl xs, foldr f 2 62 xs) 

then h = foldr f (e\, 62 ), where / x (z\, Z 2 ) = (fi x Z\,f 2 x Z 2 ). For example, 
average = uncurry div ■ sumlength 

where sumlength xs = (sum xs, length xs), and sumlength can be written with 
a single foldr. Parallel loop fusion is sometimes known as the ‘Banana Split 
Theorem’ (because, in days of old, folds were often written using “banana” 
brackets; see, for example, [4]). Prove the theorem, again using the universal 
property of foldr. 

Exercise 16. The function foldl can be expressed in terms of foldr: 

foldl f = flip (foldr (comp f) id) where comp f x u = u ■ flip f x 

Verify this claim, and hence (from the universal property of foldr) derive the 
following universal property of foldl: for h strict in its second argument, 

h = foldl f = h e [] = e A h e (x : xs) = h (f e x) xs 
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3.1 Unfolds 

To describe unfolds first recall the Haskell standard type Maybe: 

data Maybe a = Just a \ Nothing 

The function unfoldr is defined by 

unfoldr :: {(3 Maybe {a, (3)) ^ (3 ^ [cx\ 
unfoldr f b = case / & of 

Just (a, b') ^ a : unfoldr f b' 

Nothing [ ] 

For example, the standard Haskell prelude function enumPromTo is very nearly 
given by curry (unfoldr next), where 

next (a, b) = if a < b then Just (a, (succ a, b)) else Nothing 

(Only ‘very nearly’ because membership of the type class Enum does not actually 
imply membership of Ord in Haskell; the comparison is done instead by using 
fromEnum and comparing the integers.) 

The Haskell Library Report [2] states: 

The unfoldr function undoes a foldr operation. . . : 

unfoldr /' (foldr f z xs) = xs 

if the following holds: 

/' if X y) = Just (x, y) 
f z = Nothing 

That’s essentially all the Report says on unfolds! We will have more to say about 
them later on. 

3.2 Hylomorphisms 

One well-known pattern involving folds and unfolds is that of a hylomorphism 
[11], namely a function h whose definition takes the form 

h = foldr f e ■ unfoldr g 

The two component computations have complementary structures and they can 
be fused into one: 

h z = case g z of 

Nothing — > e 

Just (x, z') ^ f X (h z') 

This particular rule is known as deforestation because the intermediate data 
structure (in this case a list, but in a more general form of hylomorphism it 
could be a tree) is removed. 
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4 Producing bits 

Let us now return to arithmetic coding. As we noted above, we would like encod- 
ing to return a list of bits rather than a number. To achieve this aim we replace 
the function pick :: Interval — *■ Fraction by two functions 

type Bit = Int 0 and 1 only 

toBits :: Interval — *■ [Bit] 
fromBits :: [Bit] —>■ Fraction 

such that pick = fromBits ■ toBits. Equivalently, for all intervals int, we require 
fromBits (toBits int) within int 

The ‘obvious’ choices here are to let toBits (l,r) return the shortest binary 

fraction x satisfying I < x < r, and fromBits return the value of the binary 

fraction. Thus, fromBits = foldr pack 0, where pack b x = (b + x)/2. However, 
as Exercises 25 and 26 explain, we reject the obvious definitions and take instead 

fromBits = foldr pack (Y 2 ) 
toBits = unfoldr nextBit 

where 

nextBit :: Interval —>■ Maybe (Bit, Interval) 

nextBit (l,r) 

I r < Yb = Just (0, (2 X 1,2 X r)) 

I Y 2 < ; = /Mst (1,(2 X /- 1,2 X r- 1)) 

I otherwise = Nothing 

Exercise 17. Give an equivalent definition of nextBit in terms of narrowing by 
non-sub-unit intervals. 

We leave it as an exercise to show 

foldr pack (Y 2 ) bs = foldr pack 0 (&s -H- [1]) 

Thus fromBits bs returns the binary fraction obtained by adding a final 1 to 
the end of bs. The definition of toBits has a simple reading: if r < Y 2 , then the 
binary expansion of any fraction x such that I < x < r begins with 0; and if 
Y 2 ^ the expansion of x begins with 1. In the remaining case I <^/2 < r the 
empty sequence is returned. 

Proposition 18. length (toBits (l,r)) < — log 2 (r — 1) 

In particular, toBits always yields a finite list given a non-empty interval. 

Proof. The function toBits applied to an interval of width greater than a half 
yields the empty sequence of bits: 

0 <Z<r<lAY 2 < r—l I <^j 2 < r 

Moreover, each iteration of nextBit doubles the width of the interval. So if 
1/2"+^ < r—l < 1/2" or, equivalently, n < — log 2 (r— Z) < n+1, then termi- 
nation is guaranteed after at most n bits have been produced. 
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Proposition 19. fromBits (toBits int) within int 

Proof. The function pick = fromBits ■ toBits is a hylomorphism, so we obtain 
pick {I, r) 

I r < Y 2 = pick (2 X 1,2 x r)/2 

I V 2 < ^ = (1 + P'i-ck (2 X Z — 1, 2 X r — l))/2 

I Z < 1/2 < r = V 2 

The proof now follows by appeal to fixpoint induction. 

Exercise 20. Show that foldr pack (^ 2 ) bs = foldr pack 0 (&s -H- [1]). 

Exercise 21. Show that 

(2x/,2xr) = ( 0 , 1 / 2 ) <a(/,r) 

(2 X / - 1,2 X r - 1) = ( 1 / 2 , 1 ) <](Z,r) 

Exercise 22. Show that 

fromBits bs = mean (foldr pack 0 bs, foldr pack 1 bs) 
where mean (x , y) = (x + y) /2 

Exercise 23. Show that 

(foldr pack 0 bs, foldr pack 1 6s) = foldl (>) unit (map encodeBit bs) 
where encodeBit b = (6/2, (6+l)/2) 

Exercise 24 . One might expect toBits (I, r) to yield the shortest binary fraction 
within [l..r), but in fact it does not. What definition does? 

Exercise 25. The reason we do not use the shortest binary fraction as the defi- 
nition of toBits is that the streaming condition of Section 5.1 fails to hold with 
this definition. After studying that section, justify this remark. 

Exercise 26. Since we are using intervals that are closed on the left, one might 
expect that guard in the second clause of nextBit would be 1/2 < 1. However, 
with this definition of fromBits, the result of Exercise 42 in Section 7 fails to 
hold. After studying that section, justify this remark. 

4.1 Summary of first refinement 

Drawing together the results of this section, we define 

encodei :: Model — > Interval — > [Symbol] — *■ [Bit] 

encodei m int = toBits ■ foldl (c>) int ■ encodeSyms m 

The new version of encoding yields a bit sequence rather than a fraction. How- 
ever, execution of encodei still consumes all its input before delivering any out- 
put. Formally, encodei m ss = 1. for all partial or infinite lists ss. Can we do 
better? 
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5 Streaming 

The function encodei consists of an unfoldr after a foldl. Even under lazy eval- 
uation, the foldl consumes all its input before the unfoldr can start producing 
output. For efficiency, we would prefer a definition that is capable of yielding 
some output as soon as possible. 

To this end, we introduce a new higher-order operator stream, which alter- 
nates between production and consumption. This function has type 

stream :: {state Maybe {output , state)) 

{state — > input state) — *■ 
state — *■ [input] — >■ [output] 

and is defined by 

stream f g z xs = 

case f z of 

Just {y, z') ^ y : stream f g z' xs 
Nothing case xs of 

[] -[] 

X : xs ^ stream f g {g z x) xs 

The function stream describes a process that alternates between producing out- 
put and consuming input. Starting in state z, control is initially passed to the 
producer function /, which delivers output until no more can be produced. Con- 
trol is then passed to the consumer process g, which consumes the next input x 
and delivers a new state. The cycle then continues until the input is exhausted. 

Exercise 21. Define a variant stream that alternates between production and 
consumption but hands control to the consumer process first. 

5.1 The Streaming Theorem 

The relationship between stream and folds and unfolds hinges on the following 
definition: 

Definition 28. The streaming condition for f and g is 
f z = Just {y,z') ^ f {g z x) = Just {y,g z' x) 
for all z, y, z' and x. 

The streaming condition states very roughly that / is invariant under g. By 
induction we can then conclude that / is invariant under repeated applications 
of g; this is the content of the following lemma: 

Lemma 29. If the streaming condition holds for f and g, then 

f z = Just {y, z') => / {foldl g z xs) = Just {y, foldl g z' xs) 

for all z, y, z' and finite lists xs. 
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Proof. The proof is by induction on xs: 

Case []: Immediate. 

Case X : xs: Assume f z = Just [y, z'), so by the streaming condition we have 
f {g z x) = Just {y, g z' x). Now we reason 

/ {foldl g z {x: a:s)) 

= {definition of foldl} 
f {foldl g {g z x) xs) 

= {induction} 

Just {y, foldl g {g z' x) xs) 

= {definition of foldl} 

Just {y, foldl g z' {x : xs)) 

Now we come to the crunch. 

Theorem 30. If the streaming condition holds for f and g, then 
unfoldr f {foldl g z xs) = stream f g z xs 
for all z and all finite lists xs. 

The proof of Theorem 30 uses the following lemma, which states how to prove 
that two potentially infinite lists are equal (see [3, §9.3]). 

Lemma 31. Define approx by 

approx :: Integer — > [a] — s- [a] 

approx (n + 1) [ ] = [ ] 

approx {n + 1) {x : xs) = x : approx n xs 

Then two arbitrary lists xs and ys are equal iff approx n xs = approx n ys for 
all n. 

Proof (of Theorem 30). We use a double induction on n and xs to show that, 
provided that the streaming condition holds for / and g, 

approx n {unfoldr f {foldl g z xs)) = approx n {stream f g z xs) 

for all n, 2 and finite lists xs. The first step is case analysis on n. 

Case 0: Immediate since approx 0 a:s = T for any xs. 

Case n + 1: In this case we perform an analysis on f z: 

Subcase f z = Just {y,z'): We reason 

approx {n + 1) {unfoldr f {foldl g z xs)) 

= {applying Lemma 29} 

approx {n + 1) {y \ unfoldr f {foldl g z' xs)) 

= {definition of approx} 

y : approx n {unfoldr f {foldl g z' xs)) 

= {induction} 

y : approx n {stream f g z' xs) 

= {definition of approx} 

approx {n + 1) {y : stream f g z' xs) 

= {definition of stream} 

approx {n + 1) {stream f g z xs) 
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Subcase f z = Nothing: Now we need a case analysis on xs. The case of the 
empty list is immediate since both sides reduce to [ ] . In the remaining case 
we reason 

approx {n + 1) {unfoldr f {foldl g z {x : a:s))) 

= {definition of foldl} 

approx {n + 1) {unfoldr f {foldl g {g z x) xs)) 

= {induction} 

approx {n + 1) {stream f g {g z x) xs) 

= {definition of stream} 

approx {n + 1) {stream f g z {x : xs)) 

This completes the induction and the proof. 

Exercise 32. Show that the streaming condition holds for unCons and snoc, 
where 

unCons [] = Nothing 

unCons {x : xs) = Just {x,xs) 
snoc X xs = xs -H- [x] 

Exercise 33. What happens to the streaming theorem for partial or infinite lists? 
Exercise 34. Recall that 

nextBit :: Interval —> Maybe {Bit , Interval) 

nextBit {l,r) 

I r < V 2 = Twst (0, (0,2) > (/,r)) 

I 1/2 < / = Just {!,{-!, 1)> {I, r)) 

I Z < {/2 < r = Nothing 

Show that streaming condition for nextBit and > follows from associativity of > 
(Exercise 2) and the fact that int\ > int 2 is contained in inti (Exercise 1). 



5.2 Summary of second refinement 

At the end of Section 4.1, we had 

encodei :: Model — > Interval — > [Symbol] — *■ [Bit] 

encodei m int = unfoldr nextBit ■ foldl (>) int ■ encodeSyms m 

Since Exercise 34 established the streaming condition for nextBit and >, we can 
define 



encodc 2 Model — > Interval — > [Symbol] — *■ [Bit] 

encodc 2 m int = stream nextBit (>) int ■ encodeSyms m 

Although encodei encodc 2 , the two functions are equal on all finite symbol 
sequences, which is all we require. 
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6 Decoding and stream inversion 

The function decode2 Model — > Interval — > [Bit] [Symbol] corresponding to 

encode2 is specified by 

ss beg ins decode2 m int [encode2 m int ss) 

for all finite sequences of symbols ss . 

To implement decode2 we have somehow to invert streams. We will make use 
of a function destream with type 

destream :: [state Maybe [output , state)) 

[state —>■ input —>■ state) 

[state [output] input) 
state [output] [input] 

The definition of destream is 

destream f g h z ys = 

case / 2 of 

Just [y, z') — > destream f g h z' [ys after y) 

Nothing — > x : destream f g h [g z x) ys 

where x = h z ys 

The operator a fter is partial: 

ys a fter y = if head ys = y then tail ys else T 

The function destream is dual to stream: when / z produces something, an 
element of the input is consumed; when / z produces nothing, an element of 
the output is produced using the helper function h. Note that destream always 
produces a partial or infinite list, never a finite one. 

The relationship between stream and destream is given by the following the- 
orem: 

Theorem 35. Suppose the following implication holds for all z, x and xs: 

f z = Nothing ^ h z [stream f g z [x : xs)) = x 

Then, provided stream f g z xs returns a finite list, we have 

xs begins destream f g h z [stream f g z xs) 

Proof. The proof is by a double induction on xs and n, where n is the length of 
stream f g z xs. 

Case [ ] : Immediate since [ ] begins every list . 
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Case X : xs: We first consider the subcase f z = Nothing (which includes the 
case n = 0): 

destream f g h z {stream f g z {x : xs)) 

= {definition of destream and h z {stream f g {x : xs)) = x} 

X : destream f g h {g z x) {stream f g z {x : xs)) 

= {definition of stream} 

X : destream f g h {g z x) {stream f g {g z x) xs) 

Since {x : xs) begins (x : xs') if and only if xs begins xs' . an appeal to induction 
establishes the case. 

In the case f z = Just {y, z'), we have n ^ 0, and so stream f g z' {x : xs) 
has length n — 1. We reason 

destream f g h z {stream f g z {x \ xs)) 

= {definition of stream} 

destream f g h z {y \ stream f g z' {x \ xs)) 

= {definition of destream} 

destream f g h z' {stream f g z' (a: : xs)) 

An appeal to induction establishes the case, completing the proof. 



6.1 Applying the theorem 

In order to apply the stream inversion theorem, recall Lemma 10 which states 
that foldl (>) int ■ eneodeSyms m = snd ■ foldl step {m, int) where 

step {m, int) s = {newModel m s, int > eneodeSym m s) 

This identity allows us to fuse eneodeSyms into the narrowing process: 

eneode 2 m int = unfoldr nextBitM ■ foldl step {m, int) 

where nextBitM is identical to nextBit except that it propagates the model as 
an additional argument: 

nextBitM :: {Model, Interval) ^ Maybe {Bit, {Model, Interval)) 

nextBitM {m,{l,r)) 

I r < Y 2 = Just (0, {m, (2 X 1,2 x r))) 

I 72 < ^ = Just (1, (m, (2 X Z — 1, 2 X r — 1))) 

I otherwise = Nothing 

Theorem 30 is again applicable and we obtain the following alternative definition 
of eneode 2 '- 

eneode 2 m int = stream nextBitM step {m, int) 

Now we are ready for stream inversion. Observe that eneode 2 m int returns a 
finite bit sequence on all finite symbol sequences, so it remains to determine h. 
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Let bs = encode 2 m int (s : ss) and x = fromBits bs, so that 
X within {int > encodeSym m s) 

We can now reason: 

X within {int > encodeSym m s) 

= {with int = (Z, r) and encodeSym m s = {p, g)} 
l+{r — l)xp<x<l+{r — l)xq 
= {arithmetic} 

p < {x — l)/{r — 1) < q 
= {definition of decodeSym} 

s = decodeSym m {{x — l)/{r — 1)) 

Hence we can take 

h {m, (Z, r)) bs = decodeSym m {{fromBits bs — l)/{r — 1)) 

Putting these pieces together, we therefore obtain 

decodc 2 m int = destream nextBitM step nextSym {m, int) 

nextSym {m, {l,r)) bs = decodeSym m {{fromBits bs — 1) / {r — 1)) 
step {m, int) s = {newModel m s, int > encodeSym m s) 

where nextBitM was defined above. 

This is not a very efficient way to compute decode 2 - Each computation of 
fromBits bs requires that the bit sequence bs is traversed in its entirety. Worse, 
this happens each time an output symbol is produced. Better is to fuse the 
computation of fromBits into destream so that the bit sequence is processed 
only once. We can do this fusion with a somewhat more complicated version of 
destream. 

6.2 A better stream inversion theorem 

Replace the previous function destream with a more general one, called unstream, 
with type 



unstream :: {state - 


Maybe {output, state)) 


{state - 


-*■ input — > state) 


{state - 


result input) 


{result 


output — > result) 


state — * 


■ result [input] 



With six arguments this seems a complicated function, which is why we didn’t 
give it earlier. The definition of unstream is 

unstream f g h k z w = 

case / 2 of 

Just {y, z') unstream f g h k z' {k w y) 

Nothing x : unstream f g h k {g z x) w 

where x = h z w 
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This more complicated definition is a generalisation, since destream f g h z 
is equivalent to unstream f g h after z. The relationship between stream and 
unstream is given by the following theorem, a generalisation of Theorem 35: 

Theorem 36. Let process z = foldr (0) w ■ stream f g z. Suppose that 

f z = Nothing ^ h z {process z {x : xs)) = x 

for all z, X and xs. Furthermore, suppose that 0 satisfies {y (B w) Q y = w for 
all y and w . Then, provided stream f g z xs returns a finite list, we have 

xs begins unstream f g h (0) z {process z xs) 

The proof is so similar to the earlier one that we can leave details as an 
exercise. The point of the new version is that, since fromBits = foldr pack (Y 2 ) 
where pack b x = {b + x)/2, we can define 0 = unpack, where unpack x b = 
2 X X — b. As a consequence, we obtain 

decodc 2 m int bs = 

unstream nextBitM step nextSym unpack {m, int) {fromBits bs) 

In this version the bit sequence bs is traversed only once. Nevertheless, decodc 2 
is not an incremental algorithm since all of bs has to be inspected before any 
output is produced. 

Exercise 37. Following the steps of the proof of the first version of stream inver- 
sion, prove the second version of stream inversion. 

Exercise 38. What substitutions for 0 and w in Theorem 36 yield Theorem 35? 

7 Interval expansion 

The major problem with encodc 2 and decodc 2 is that they make use of fractional 
arithmetic. In Section 8 we are going to replace fractional arithmetic by arith- 
metic with limited-precision integers. In order to do so we need a preparatory 
step: interval expansion. Quoting from Howard and Vitter [8]: 

The idea is to prevent the current interval from narrowing too much 
when the endpoints are close to Yb but straddle Y 2 - In that case we do 
not yet know the next output bit, but we do know that whatever it is, the 
following bit will have the opposite value; we merely keep track of that 
fact, and expand the current interval about Y 2 - This follow-on procedure 
may be repeated any number of times, so the current interval is always 
strictly longer than Y 4 - 

For the moment we will just accept the fact that ensuring the width of the 
current interval is greater than Y 4 before narrowing is an important step on the 
path to limited precision. 
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Formally, interval expansion is a data refinement in which an interval (Z, r) 
is represented by a triple of the form (n, (V , r')) satisfying 

V = scale (n, 1) and r' = scale {n, r) 

where scale {n, i) = 2” x (a; — 72) + 72, subject to 0 < Z' < r' < 1. In particular, 
(0, {I, r)) is one possible representation of (/, r). 

A fully- expanded interval for (Z,r) is a triple {n, (V , r')) in which n is as 
large as possible. Intervals straddling 72 will be fully-expanded immediately 
before narrowing. The remainder of this section is devoted to installing this data 
refinement. More precisely, with ei denoting an expanded interval and contract ei 
the corresponding un-expanded interval, our aim is to provide suitable definitions 
that justify the following calculation: 

toBits ■ foldl ([>) int 
= {assuming int = contract ei\ 
toBits - foldl ([>) {contract ei) 

= (fold-fusion (in reverse) for some function enarrow} 
toBits ■ contract ■ foldl enarrow ei 
= (definition of toBits} 

unfoldr nextBit ■ contract ■ foldl enarrow ei 
= {for some suitable definition of nextBits} 
concat ■ unfoldr nextBits ■ foldl enarrow ei 
= (streaming} 

concat ■ stream nextBits enarrow ei 

The function enarrow connotes “expand and narrow” and is an operation that 
first expands an interval before narrowing it. Given this motivating calculation, 
we can then define 

encode^ m ei = concat ■ stream nextBits enarrow ei ■ encodeSyms m 

Arithmetic coding is then implemented by the call encodes m (0,(0,!)). Note 
that composing concat with stream still gives incremental transmission because 
of laziness: the argument to concat does not have to be evaluated fully before 
results are produced. 

7.1 Defining expand and contract 

First, we give a definition of the function expand that expands intervals. Observe 
that 

0 < 2 X (Z-72) + 72 =^k<i 
2 X ( r - 72 ) + 72 < 1 = r <^/a 

Hence we can further expand (n, (Z, r)) if Tr < I and r < 7i- This leads to the 
definition 

expand {n, {I, r)) 

\^/ 4 : < I f\ r = expand (n-|-l, {2x1 — 72, 2 x r — 72)) 

I otherwise = {n,{l,r)) 



18 



R. Bird and J. Gibbons 



The function nextBits, to be defined in a short while, will return Nothing on 
intervals that straddle ^ 2 - Consequently, in encode^ we expand intervals (Z, r) 
satisfying I < ^(2 < t immediately before narrowing. It follows that narrowing 
is applied only when I < and 72 < r, or I < 72 and 7i < in either case, 
< r — which is the key inequality. 

The converse of expand is given by 

contract (n,{l,r)) = {rescale (n,l), rescale (n^r)) 

where rescale (n,x) = {x — 72)/2" + 72- We leave it as exercises to verify that 

contract ■ expand = contract 

contract {n, inti > int2) = contract (n, inti) \> int2 

Consequently, defining enarrow by 

enarrow ei int2 = {n, inti > int2) 

where {n,intl) = expand ei 

we have contract {enarrow ei int) = contract ei > int. An appeal to fold- fusion 
therefore gives 

contract ■ foldl enarrow ei = foldl (>) {contract ei) 

This identity was used in the motivating calculation above. The remaining step 
is to find some suitable definition of nextBits so that 

toBits ■ contract = concat ■ unfoldr nextBits 

and also that nextBits and enarrow satisfy the streaming condition. 

The definition of nextBits turns out to be 

nextBits (n, {I, r)) 

I r < 72 = Just {bits n 0, (0, (2 x 1,2 x r))) 

1^2 I = Just {bits n 1, (0, (2 X / — 1, 2 x r — 1))) 

I otherwise = Nothing 

where bits n b = b : replicate n (1— &) returns a b followed by a sequence of n 
copies of 1—6. The proof that this definition satisfies all our requirements is left 
as an exercise. 

Exercise 39. Verify that 

contract ■ expand = contract 

contract {n, inti > int2) = contract {n, inti) > int2 

Why don’t we have contract ■ expand = id? 

Exercise 4.0. Prove that 

rescale {n, x) < ^2 = x < 72 
rescale {n, x) > ^2 = x > 72 

Hence contract {n,{l,r)) straddles 72 iff {Ku) does. 



Arithmetic coding with folds and unfolds 



19 



Exercise -^1. Prove that 

2 X rescale {n + l,x) = rescale (n, x) + 

2 X rescale {n + l,x) — 1 = rescale (n, x) — ^2 

Exercise ^2. Prove by induction on n that 

toBits (2 X rescale (n,Z),2 x rescale (n^r)) = 

= replicate n 1 -H- toBits (2 x Z,2 x r) 
toBits (2 X rescale {n, /) — 1, 2 x rescale {n, r) — 1) = 

= replicate n 0 -H- toBits (2 x Z — 1, 2 x r — 1) 

Exercise 43. Prove that if Z < Y 2 < r then 

toBits {contract {n,{l.,r))) = concat {unfoldr nextBits {n.,{l,r))) 

Exercise 44- Prove that if r < ^2 then 

toBits {contract {n, (Z, r))) = bits n 0 -ff toBits (2 x Z,2 x r) 

Similarly, prove that if ^2 < I then 

toBits {contract {n, {I, r))) = bits n 1 -H- toBits (2 x Z — 1, 2 x r — 1) 
Hence complete the proof of toBits ■ contract = concat ■ unfoldr nextBits. 
Exercise 45. Verify that the streaming condition holds for nextBits and enarrow. 



8 Prom fractions to integers 

We now want to replace fractional arithmetic by arithmetic with limited-precision 
integers. In the final version of arithmetic coding, intervals take the form (Z, r), 
where Z and r are integers in the range 0 < I < r < w and w is a fixed power of 
two. This pair represents the interval {fw, 7™)- 

Intervals in each model m take the form {p, q, d), where p and q are integers 
in the range Q < p < q < d and d is an integer which is fixed for m and called 
the denominator for m. This triple represents the interval {^/d, Yd)- 

8.1 Integer narrowing 

The narrowing function is redefined as follows: 

(Z,r) ► (p, p, d) = {l+ [(r-Z) X 7dJ, Z -k [(r-Z) x «/dJ) 

Equivalently, 

(Z, r) ► (p, p, d) = {I + {{r—l) X p) div d, Z + {{r—l) x q) dfu d) 



A reasonable step, you might think, but there are a number of problems with it: 
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~ the revised definition of narrowing completely changes the specification: en- 
coding will now produce different outputs than before and, in general, the 
effectiveness of compression will be reduced; 

~ worse, ► is not associative, and none of the foregoing development applies; 
~ unless we take steps to avoid it, intervals can collapse to the empty interval 
when [(r-l) x P/d] = [{r-l) x «/dJ • 

The middle point seems the most damaging one, and is perhaps the reason that 
writers on arithmetic coding do not attempt to specify what problem arithmetic 
coding solves. 

8.2 Change of specification 

Fortunately, we can recover all of the previous development. Observe that 
{I, r) ► (p, q, d) = {%, 7„) > {P'/d, '^'/d) 

where 

/ = X [{r-l) X P/d\ 

q' = 7r-; X [(r-/) x ‘>/d\ 

Hence, provided p' < q' , integer narrowing of an interval (/, r) by another interval 
(p, q) drawn from a model m can be viewed as fractional narrowing of (/, r) by 
the corresponding interval (p', q') drawn from an adjusted model adjust (/, r) m. 
Note that p' < p and q' < q, so the effect of this adjustment is that some of the 
intervals shuffle down a little, leaving a little headroom at the top (see below for 
an example). We do not need to implement adjust; the important point is by 
invoking it at every step all of the previous development remains valid. 

It is instructive to illustrate the adjustments made to the model. Consider 
Figure 1 in which w = 64 and d = 10. The columns on the left show a given 
sequence of models that might arise after processing symbols in the string ABAC. 
For example, the first row shows a model in which A is associated with the inter- 
val [0.0. .0.3), B is associated with [0.3. .0.6), and C with [0.6. .1.0). The columns 
on the right show the corresponding adjusted intervals to three decimal places. 
The current intervals immediately before processing the next symbol are shown 
in the middle. The output of the integer implementation is 0010010, while that 
of the real implementation is 00100, so there is a deterioration in compression 
effectiveness even for this short string. 

8.3 When intervals collapse 

It is left as an exercise to show that 

(Vp, q:0 <p < q< d : [{r-l) x P/d\ < [{r-l) x «/dJ) 

if and only \i d < r — 1. Hence we have to ensure that the width of each interval 
is at least d before narrowing. But interval expansion guarantees that the width 
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models ABC 

initial model: 0.0 0.3 0.6 
after A: 0.0 0.4 0.7 

after B: 0.0 0.4 0.8 

after A: 0.0 0.4 0.8 

after C: 0.0 0.5 0.7 



adjustments 
adjust (0,64): 
adjust (0,38): 
adjust (30,52): 
adjust (24,56): 
adjust (8,64): 



ABC 
0.0 0.297 0.594 
0.0 0.395 0.684 
0.0 0.364 0.773 
0.0 0.375 0.781 
0.0 0.500 0.696 



Fig. 1. Model adjustment 



of each (expanded) interval is greater than ™/4 before narrowing, so interval 
collapse is avoided if ™/4 > d. That was the whole point of making nse of interval 
expansion. 

Since w x d < w x if w = 2*^, we have to ensnre that onr 

limited-precision arithmetic is accnrate to 2 x e — 2 bits. 

Exercise 4-6. Prove that 

(Vp, q:0 <p < q< d : [{r-l) x P/d\ < [{r-l) X 9/dJ) 
if and only li d < r — 1. 

Exercise 4'^- According to the Haskell Report [1], the finite-precision type Int 
covers at least the range [—2^®, 2^® — 1]. What are snitable choices for w and d? 



8.4 Final version of encode 



Gathering together the ingredients of this data refinement, we can now give the 
final version of encode: 



encode m ei = concat ■ stream nextBits enarrow ei ■ encodeSyms m 



where 



enarrow ei int2 = (n, inti ► int2) 

where (n,mfl) = expand ei 

expand (n, (Z, r)) 

I “/4 < Z A r < 3 X “/4 = expand (n-hl, {2x1 — ™/2, 2 x r — ™/2)) 
I otherwise = (ri,{l,r)) 



nextBits (n, {I, r)) 

I r < ™/2 
I 72 < ^ 

I otherwise 



= Just {bits n 0, (0, (2 x 1,2 x r))) 

= Just {bits n 1, {0,{2 X I — w,2 X r — w))) 
= Nothing 



Arithmetic coding is now implemented by encode m (0, (0, w)). 



Exercise 4^- Instead of nsing semi-open intervals [l..r) we conld nse a closed 
interval [l..r — 1]. What modifications are reqnired to the definitions of encode 
and decode, and why shonld snch a representation have an advantage over the 
semi-open one? 
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Exercise 49- Notwithstanding everything that has gone before, encoding is not 
guaranteed to work with any form of limited-precision arithmetic! Why not? 

Exercise 50. Imagine a static model of three equiprobable symbols A, B and C, 
so that B is assigned the range Suppose a message of a billion B’s is to 

be encoded. What is the output? How big does n get in the definition of expand? 
What does this relationship reveal about the answer to the previous exercise? 

8.5 Decoding in the integer version 

Decoding with limited-precision arithmetic is again implemented by appeal to 
stream inversion, just as in the previous version. Let us start by showing how 
to compute the symbol s from bs = encode m ei {s : ss) under the assumption 
that nextBits ei = Nothing, so that ei straddles ^2 and expand ei delivers 
an integer that will not collapse to the empty interval on narrowing. Setting 
wfromBits = {wx) ■ fromBits, we know that x = wfromBits bs is a fraction in 
the interval [0..w) satisfying 

X within contract {enarrow ei {encodeSym ms)) 

How can we compute s given x, m, and ei? We need to be able to do this in 
order to define the helper function nextSym for unstream. 

To determine s, we make use of the following property of floors: for all integers 
n and fractions x, we have n < \x\ = n < x. Ignorance of this simple rule has 
marred practically every published paper on arithmetic coding that we have 
read. 

We now reason: 

X within ( contract (enarrow ei (encodeSym ms))) 

= {setting (n,(l,r)) = expand ei } 

X within (contract (n, (I, r) ► encodeSym m s)) 

= {setting y = scale (n, x) } 
y within ((I, r) ► encodeSym m s) 

= {setting (p, q, d) = encodeSym ms} 
l+l(r-l)x%\<y <1+ L(r - 1) x %\ 

= {arithmetic} 

L(r -l)x%\<y-l< L(r - 0 X %\ 

= {rule of floors, setting k = [j/J| 

L(r -l)x%\<k-l< L(r -l)x %\ 

= {arithmetic} 

l(r-l)x%\ <k-l + l< L(r - /) X 9/dJ 
= {rule of floors} 

(r - 1) X P/d < k - I + 1 < (r - 1) X 
= {arithmetic} 

p < ((k — I + 1) X d — l)/(r — 1) < q 
= {rule of floors} 

P < L((^ - l + \) X d - \) / (r - l)\ < q 
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Hence, redefining decodeSym to have type Model —f Int —f Symbol, we have 

nextSym {m, ei) x = decodeSym m t 
where t = {{k — I + 1) x denom m — 1) div (r — 1) 

k = [scale {n,x)\ 

{n, (I, r)) = expand ei 

Armed with this result, we can now tackle the task of inverting encode. First, 
as before, we rewrite encode in the form 

encode m ei = concat ■ stream nextBitsM step [m, ei) 

where step {m,ei) s = {newModel m s,enarrow ei [encodeSym m s)) and 
nextBitsM carries the model as an extra argument: 

nextBitsM (m,{n,{l,r))) 

I r < ™/2 = Just {bits n 0, (m, (0, (2 X 1,2 x r)))) 

I “72 < Z = Just {bits n 1, (m, (0, (2 x I — w,2 x r — w)))) 

I otherwise = Nothing 

Now set X = wfromBits {concat {stream nextBitM step {m,ei) {s : ss))). An 
appeal to fold-fusion gives 

wfromBits = foldr pack {w /2) 

where pack b x = {w x b + x) /2 

A second appeal to fold-fusion gives 

wfromBits ■ concat = foldr (0) {w /2) 

where bs ® x = foldr pack x bs. Moreover, defining 

X Q bs = foldl unpack x bs 

where unpack x b = 2 x x — w x b, we have {bs (B x) Q bs = x hy Exercise 14. 

All the ingredients for destreaming are now in place, and we can define 

decode m ei bs = 

unstream nextBitsM step nextSym (0) {m, ei) {wfromBits bs) 

where 

nextSym {m, ei) x = decodeSym m t 
where t = {{k — I + 1) x denom m — 1) ddv {r — 1) 

k = [scale {n,x)\ 

{n, {I, r)) = expand ei 

and 

X Q bs = foldl unpack x bs 

where unpack x b = 2xx — wx b 

The one remaining fly in the ointment is that decode is not incremental, as all 
elements of bs are inspected in order to compute wfromBits bs. 
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8.6 A final data refinement 

Consider the first invocation of nextSym in the computation decode m ei bs. We 
have to compute 

k = [2" X {wfromBits bs — ™/2) + “/2J 

This can be done without inspecting all of bs. We only need to compute the first 
e + n bits, where w = 2®. This is the clue to making decode incremental. 

Suppose we represent bs not by 2: = wfromBits bs but by a pair (z,rs) where 
z is the binary integer formed from take e bs (assuming bs contains at least e 
bits) and rs = drop e bs. Then z = [wfromBits &sj . If bs contains fewer than 
e bits, then we can always append a 1 to 6s followed by a sufhcent number of 
Os. To justify this, recall Exercise 20. Let us call this computation buffering and 
write {z,rs) = buffer bs. 

Given {z, rs) = buffer bs we can now compute k = fscale (n, (z, rs)), where 

fscale (n, (z, rs)) = foldl {X x b^2xx + b — ’^jf) z {take n rs) 

The proof is left as an exercise. Hence k can be computed by inspecting only the 
first e + n bits of bs. 

To install this final refinement we need to show how to compute buffer. There 
are two ways to do it and we will need both. The first is to define 

buffer bs = {foldl {X x b —>■ 2 x x + b) 0 cs, rs) 

where (cs, rs) = splitAt e (6s -H- 1 : replicate (e — 1) 0) 

The definition of z uses the standard method for converting a bit string into a 
binary integer. This method is used in the final version of decode. 

But we also have to show how to maintain the representation {z,rs) during 
the destreaming process. We leave it as an exercise to show that buffer can also 
be computed by 

buffer =foldr op (72 ,[]) bs 
op 6 {z, rs) = {y,r : rs) 

where {y, r) = {w x b + z) divMod 2 

The point of this alternative is that we have 

foldr op (72J]) ■ concat = foldr (0) (72 J]) 

where 6s 0 {x, ds) = foldr op {x, ds) bs. Moreover, we can invert 0 by defining 
0 to be 

{z,rs)Q bs = foldl unop {z,rs) bs 

unop {z, rs) b = {2xz — wxb + head rs, tail rs) 

Now all the ingredients for destreaming are once again in place. 

Exercise 51. Show that [scale {n, wfromBits 6s)J = fscale {n, buffer bs), where 

fscale {n, {z, rs)) = foldl {X x b^2xx + b — 72) z {take n rs) 
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Exercise 52. Show that 

buffer = foldr op {%,[]) bs 
op b (z, rs) = {y,r : rs) 

where {y, r) = {w x b + z) divMod 2 

8.7 Final version of decode 

Here is the final version of decode: 
decode m ei bs = 

unstream nextBitsM step nextSym (0) (m, ei) {buffer bs) 
buffer bs = {z, rs) 

where z = foldl {X x b ^ 2 x x + b) 0 cs 

(cs, rs) = splitAt e {bs -H- 1 : replicate (e — 1) 0) 

nextSym {m, ei) {z, rs) = decodeSym m t 

where t = {{k — Z + 1) x denom m — 1) cUv (r — 1) 

k = f scale {n,{z,rs)) 

{n,{l,r)) = expand ei 

{z, rs) Q bs = foldl unop {z, rs) bs 

where unop (z, rs) b={2xz — wx b + head rs., tail rs) 

fscale {n, (z, rs)) = foldl {X x b^2xx + b — “/j) z {take n rs) 

The remaining functions nextBitsM , step, and expand were defined previously. 

9 Conclusions 

The reader who has followed us up to now will appreciate that there is rather a 
lot of arithmetic in arithmetic coding, and that includes the arithmetic of folds 
and unfolds as well as numbers. As we said at the start, arithmetic coding is 
a simple idea but one that requires care to implement with limited-precision 
integer arithmetic. To the best of our knowledge, no previous description of 
arithmetic coding has ever tackled the formal basis for why the method works, 
let alone providing a formal development of the coding and decoding algorithms. 

Perhaps not surprisingly we went through many iterations of the develop- 
ment, considering different ways of expressing the concepts of streaming and 
stream inversion. The final constructions given above differ markedly from the 
versions given in the Summer School in August, 2002. None of these iterations 
would have been possible without the availability of a functional perspective, 
whose smooth proof theory enabled us to formulate theorems, prove them, and 
perhaps discard them, quite quickly. Whether or not the reader has followed all 
the details, we hope we have demonstrated that functional programming and 
equational reasoning are essential tools of thought for expressing and proving 
properties of complicated algorithms, and that the ability to define structured 
recursion operators, such as foldl, unfoldr, stream and destream, is critical for 
formulating and understanding patterns of computation. 
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Abstract Many array-centric algorithms from computational science 
and engineering, especially those based on dynamic and irregular data 
structures, can be coded rather elegantly in a purely functional style. The 
challenge, when compared to imperative array languages, is performance. 
These lecture notes discuss the shortcomings of Haskell’s standard arrays 
in this context and present an alternative approach that decouples array 
from list processing and is based on program transformation and generic 
programming. In particular, we will present (1) an array library that 
uses type analysis to achieve unboxing and flattening of data structures 
as well as (2) equational array fusion based on array combinators and 
compiler-driven rewrite rules. We will make use of a range of advanced 
language extensions to Haskell, such as multi-parameter type classes, 
functional dependencies, rewrite rules, unboxed values, and locally state- 
based computations. 

1 Motivation 

Let us start with something simple, namely the dot product of two vectors: 



In Haskell 98, using the standard array library, we can define the function rather 
nicely as 

type Vector = Array Int Float 



Unfortunately, this elegance comes at a considerable price. Figure 1 (next page) 
graphs the running time in dependence on the length of the input vectors. The 
figures were obtained by running the code on a 1.2GHz Pentium HIM under 
GNU/Linux compiled with GHG 5.04.1 and optimisation level -02. At 100,000 
elements, the code needs 3/xs per vector element, which seems slow for a 1.2GHz 
machine. 

J. Jeuring and S.P. Jones (Eds.): AFP 2002, LNCS 2638, pp. 27—58, 2003. 

© Springer-Verlag Berlin Heidelberg 2003 



n— 1 




(•) :: Vector Vector Float 

V ■ w = sum [v\i * w\i \ i <— indices f] 
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Number of elements 

Figurel. Running time of computing the dot product in Haskell (in ms) 



This suspicion is easily verified by timing the corresponding C function, which 
is as follows: 

float dotp (float *vl, float *v2, int n) 

{ 

int i ; 
float sum = 0; 

for (i = 0; i < n; i++) 
sum += vl[i] * v2[i]; 
return sum; 

} 

On 100,000 element vectors, the C code runs about 300 times faster!^ 

The remainder of these lecture notes will (1) look into the reasons for this 
huge performance difference, (2) propose a slightly different approach to ar- 
ray programming that avoids some of the inherent inefficiencies of the standard 
Haskell approach, and (3) discuss an optimising implementation scheme for the 
new form of array programs. In particular, the presentation includes the detailed 
treatment of an array library that makes use of type analysis (aka generic pro- 
gramming) to achieve unboxing and flattening of data structures. The library 
also optimises array traversals by way of GHC’s rewrite rules. 

2 Where Do the Inefficiencies Come Prom? 

Why is the Haskell code for the dot product so slow? In this section, we will study 
three functions over arrays, which are of increasing sophistication; each function 
will demonstrate one of the shortcomings of Haskell 98’s standard arrays. 

This is compiled with GCC 3.2 using -02. 
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2.1 Vector Dot Product: Lists are Slow 

Looking at the expression sum [v\i * w\i \ i <— indices r;] from the dot product 
code, two properties stand out: most of the operations involved are actually list 
operations and it creates a superfluous intermediate structure, namely the list 
produced by the list comprehension and consumed by sum. The suspicion that 
these two properties are the main culprits is easily verified by measuring the 
running time of an explicitly deforested [Wad88] version of the dot product: 

V ■ w = loop 0 0 

where 

loop i acc \ i > n = acc 

I otherwise = loop (i + 1) (u!i * wH + acc) 
n = snd {bounds v) 

This optimisation leads to a dramatic improvement: the explicit loop is by a 
factor of about 25 faster than the original Haskell code. On one hand, this is 
good news, as it means that we can optimise quite effectively within Haskell. 
On the other hand, we would prefer the compiler to automatically perform the 
conversion from the comprehension-based to the loop-based code. 

In summary, the lesson learnt from studying the dot product implementation 
is that the use of lists to drive array computations is highly detrimental to 
performance. But this is not the whole story as we will see during the discussion 
of the next benchmark. 

2.2 Matrix Vector Multiplication: Boxing is Slow 

Let us look at a slightly more complicated function than the dot product, namely 
matrix-vector multiplication w = Av where 

m — 1 

Wi = AijVj , 0 < f < n — 1 
3=0 

Using the convenient, but slow, list-based interface to arrays, we have 
type Matrix = Array {Int, Int) Float 
mum :: Matrix — > Vector —>■ Vector 

mum a V = list Array {0, n') [sum [a\{i,j) * v\j \ j <— [0..m']] | i ^ [0..n']] 
where 

{n' , m') = snd {bounds a) 

Unlike in the case of the dot product, we can only partially avoid the intermediate 
lists by using an explicit loop in mum. More precisely, we can transform the inner 
list comprehension together with the application of sum into a loop that is very 
much like that for the dot product, but we cannot as easily remove the outer 
list comprehension — to create the result array, Haskell 98 forces us to go via a 
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Figure2. Running time of matrix- vector multiplication in Haskell (in ms) 
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Figures. Different array representations; (a) unboxed and (b) boxed 



list. This proves to be rather limiting for mvm. For a range of square matrices, 
Figure 2 graphs the running time of mvm for both the version displayed above 
as well as the version where the inner list comprehension is replaced by an 
explicit loop. The version using an explicit loop is clearly superior, but by a less 
significant factor than in the dot product benchmark. A comparison with the 
corresponding C function reveals that this is not because the list-based Haskell 
code is better, but because optimising the inner loop alone is far from sufficient. 
In brief, for an 800 x 800 matrix, the C code runs more than a 200 times faster 
than the comprehension-based Haskell code and about 120 times faster than the 
Haskell code that uses an explicit loop instead of the inner list comprehension. 

Part of the inefficiencies derive from the parametricity of Haskell arrays; i.e., 
the fact that Haskell arrays use the same representation regardless of the type 
of elements contained in the array. Hence, even arrays of primitive values, such 
as integer and floating-point numbers, use a boxed element representation. This 
affects performance negatively for two reasons: the code needs a larger number of 
memory accesses and cache utilisation is worse. The difference in layout between 
boxed and unboxed arrays is depicted in Figure 5. The inefficiencies are caused by 
the additional indirection in the boxed array implementation. In addition, these 
boxed arrays are lazy, which further increases the number of memory accesses, 
reduces cache utilisation, and also makes branch prediction harder for the CPU. 
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mvm a V = runST {do 

ma <— newArray. (0, n) 
outerLoop ma 0 
unsafeFreeze ma 



where 

outerLoop :: STUArray s Int Float Int — > ST s () 
outerLoop ma i \ i > n = return () 

I otherwise = do 

writeArray ma i {loop i 0 0) 
outerLoop ma (i + 1) 
loop i j aec \ j > m = ace 

I otherwise = loop i {j + 1) {ace + a\{i,j) * v\j) 



Figure4. Optimised matrix vector multiplication using the ST monad 



GHC’s libraries provide an extended implementation of the standard array 
interface, which supports unboxed arrays for basic types. ^ When we modify the 
version of mvm that is based on an explicit loop to use unboxed arrays, the run- 
ning time is reduced by approximately 50%. This is a significant improvement, 
but it is still more than a factor of 50 slower than the C code. A contributor 
to the remaining gap is the lack of inlining a critical function in the libraries of 
GHC 5.04.1. We gain more than another factor of 4 by forcing the inlining of 
that function. Finally, by replacing the outer loop and the call to listArray by 
a state-based loop using GHG’s mutable arrays, as provided in the ^T-monad,^ 
we gain yet another factor of nearly two, as we avoid the list-based interface of 
listArray. 

Overall, if we also disable bounds checking, the Haskell code based on the 
ST monad, brings us within a factor of 7-8 of the performance of the G code. 
This is good news. The bad news is that the resulting code, which is displayed 
in Figure 4, is even less readable than the G code. 

In summary, we can infer that as soon as new arrays are constructed, unboxed 
arrays are to be preferred for best performance. If we combine this with using a 
stateful programming style for array construction, we get good performance at 
the expense of code clarity. 

2.3 Sparse Matrix Vector Multiplication: Nesting is Slow 

Interestingly, there are applications that are tricky to handle in both plain G code 
and in the ST monad with unboxed arrays. An example of such an application 
is the multiplication of a sparse matrix with a vector. A popular representation 

^ Unboxed arrays are implemented by the data constructor UArray from the module 
Data. Array .Unboxed (in GHC 5.04 upward). 

® See GHC’s interfaces for D ata. Array. M Array and Data.Array.ST . An introduction 
to the ST monad is contained in [PL95]. 
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Figures. Standard layout of irregular nested arrays 



of sparse matrices is the so-called compressed sparse row format [DER86]. This 
format represents a sparse row by an array of colnmn-index/valne pairs, where 
each pair represents a non-zero element of the sparse matrix. An array of snch 
sparse rows implements a sparse matrix. 

type SparseRow = Array Int {Int, Float) — colnmn index, valne 
type SparseMatrix = Array Int SparseRow 

As an example, consider the following matrix and its compressed array repre- 
sentation: 



5 0 0 0 
0 0 0 7 
34 0 0 
0 0 0 0 



listArray {0,4) [list Array (0,0) [(0, |~^)], 

list Array (0,0) [(3, pT|)], 

listArray {0,1) [(0, [^), (1, [^)], 

listArray (0,-1) []] 



The nnmbers that represent actnal non-zero valnes in the matrix are highlighted 
in the array representation. 

Based on standard Haskell arrays, we can denote the mnltiplication of a 
sparse matrix with a dense vector, resnlting in a new dense vector, as follows: 

smvm :: SparseMatrix — *■ Vector —>■ Vector 

smvm sm vec = listArray bnds 

[sum [a; * {vedcol) [ {col, x) <— elems row] [ row <— elems sm] 
products of one row 

where 

bnds = bounds sm 

This code again is nice, but lacking in performance. The trouble is that it is not 
clear how to use unboxed arrays to improve the code. GHC provides unboxed 
arrays for primitive element types, such as floating-point and integer numbers. 
However, the sparse matrix representation uses an array of pairs and an array 
of arrays of pairs. 

A quite obvious idea is to represent an array of pairs by a pair of arrays. Less 
clear is how to handle nested arrays efficiently. In the case of dense matrices, we 
did avoid using an array of arrays by choosing an array with a two-dimensional 
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index domain. For sparse matrices, it is more difficult to find an efficient repre- 
sentation. As depicted in Figure 5, the subarrays are of varying size; hence, a 
compact representation will need to include an indexing scheme that takes the 
irregular structure into account. To achieve this, we will discuss a method for 
separating the actual matrix elements from the structure of the sparse matrix 
in the next section. 

In summary, we can draw two conclusions from the sparse matrix example: 
firstly, irregular, nested array structures are more difficult to represent efficiently 
than simple fiat arrays; and secondly, irregular, nested structures also hit on lim- 
itations of imperative languages. Thus, irregular structures seem to be where we 
might be able to provide a serious advantage over conventional array languages 
by using a functional language in combination with a suitable set of program 
transformations . 



3 Parallel Arrays and the Flattening Transformation 

So far, it appears as if we can have either beauty or performance. On one hand, 
standard Haskell arrays in combination with list comprehensions and list combi- 
nators enable elegant, but slow, formulations of array algorithms. On the other 
hand, unboxed arrays and the ST monad enable fast, but inelegant implementa- 
tions. The rest of these lecture notes explore an approach to reconciling beauty 
and performance, while simultaneously optimising nested structures, such as 
those needed to represent sparse matrices. 

For optimal convenience, the whole set of program transformations discussed 
in the following should be automated by a compiler. However, our current im- 
plementation is only partial, which means that some transformations need to be 
performed manually. 

3.1 An Alternative Array Notation 

The conclusions drawn in the previous section prompt us to consider an alter- 
native array interface. Firstly, from a semantic viewpoint, the use of unboxed, 
instead of boxed arrays implies a change. More precisely, arrays are no longer 
lazy, but instead have a parallel evaluation semantics, where all elements are 
computed as soon as any of them is needed. Hence, we call the new form of 
arrays parallel arrays.^ Secondly, we want to entirely avoid the reliance on list 
operations and drop parametrisation of the index domain in favour of simple 
integer indexes. To emphasise the use of the new array interface, we introduce 
a new notation for array types and array comprehension. The new type corre- 
sponds to Haskell arrays as follows: 

type [:a:] = Array Int a — parallel arrays 

* In fact, these arrays are also well suited for a parallel implementation, but we will 
not cover this aspect in these lecture notes. 
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Moreover, we have the following correspondence between the new and the stan- 
dard form of array comprehensions: 

[:ci I p <— 62 , g:] = listArray bnds [ei | p <— elems 62 , q'] 

where the other qualifiers q are treated similarly and bnds depends on the size 
of the list produced by the list comprehension. We assume that for each list 
operation of the Haskell Prelude (that generates finite lists), there exists a corre- 
sponding operation on the array type [:■:]. In particular, we require the existence 
of functions lengthP, zipP, filterP, replicateP , and concatP. 

3.2 An Overview of the Transformations 

The implementation scheme that we discuss in the following comprises three 
program transformations: 

1. Code vectorisation: The code is transformed such that it facilitates process- 
ing entire arrays in one sweep. Such collective array operations enable the 
compiler to generate code that has a structure similar to loop-based code in 

C. 

We will discuss this transformation in detail in the present section. 

2 . Structure flattening: As we saw in the previous section, boxed arrays are too 
expensive. Hence, we need to find a method that transforms arrays of struc- 
tured types into an alternative representation that makes use of unboxed 
arrays. We call this a representation transformation. 

We will discuss some of the basic ideas of this transformation at a later 
point in this section when we illustrate the implementation technique at the 
example of smvm. 

3. Array Fusion: To avoid superfluous intermediate structures and improve 
cache performance, consecutive array traversals need to be amalgamated. 
This is especially important as, as an artifact of the previous two trans- 
formations, we will end up with a multitude of simple array operations. 
Executing them as they are would lead to extremely poor cache behaviour 
and a large number of intermediate structures. 

Array fusion will be discussed in Section 5. 

In addition to these three general transformations, we need to optimise special 
cases, as for some common operations we can do much better than the general 
rules due to knowledge about the algebraic properties of some functions. In 
these lecture notes, we will not discuss the treatment of special cases in much 
detail, but only illustrate some basic ideas when discussing the implementation 
of smvm. 

3.3 Code Vectorisation 

The essential idea behind code vectorisation is the lifting of code that operates 
on individual elements to code that processes entire arrays in bulk. The approach 
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Function definitions 


D ^ VVi V„ = E 


(n > 1) 


Expressions 


E ^ C 


(constant) 




V 


(variable) 




EEi---E„ 


(application, n > 1) 




I let V = El in E 2 


(local binding) 




1 case E of <Vi Ei; >V 2 ^ E 2 


(selection) 




1 [:Ei|V^E2:] 


(array comprehension) 



Figure6. Grammar of a simple functional programs 



to vectorisation that we take was introduced in [BS90,CK00]. Here we will only 
explain the core idea of the transformation at the example of a restricted set of 
functional programs, in such a way that vectorisation can be performed manually. 
Nevertheless, the transformation, as presented in the following, is sufficient to 
handle typical array codes, such as the sparse matrix vector multiplication from 
Section 2.3. 

Figure 6 displays the grammar of the restricted set of functional programs 
that we consider in this subsection. We allow only top-level function definitions 
and restrict array comprehensions to single generators without filter expressions. 
The later is not really a restriction as multiple generators and filter expressions 
can be expressed by using the standard functions zipP or filterP, respectively. 
There are neither partial applications nor local function definitions. Moreover, 
we restrict ourselves to a two-way case construct that scrutinises binary sum 
types defined as 

data a-|-6 = <ia|>6 

which corresponds to Haskell’s Either type. All this may seem restrictive, but it 
simplifies the presentation of the vectorisation transformation significantly. 

Function definitions. For each function / that occurs, directly or indirectly, 
in an array comprehension, we need to generate a vectorised variant by the 
following scheme that makes use of the lifting transformation 

f xi ... Xn = e — original definition 

f^ x\ . . . Xn= — vectorised version 

We assume that for primitive operations, such as (-I-), (*), and so on, vectorised 
versions are already provided. For example, (-1-^) adds the elements of two arrays 
pairwise: 

[:1,2,3:]+T [:5,4,3:] = [:6,6,6:] 

Vectorised functions, in essence, correspond to mapping the original function 
over an array or, in case of a function with multiple arguments, zipping the 
arguments and mapping the function over the result. 
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As a next step, we replace each array comprehension by the lifted version of 
the body expression of the comprehension, where we bind the array generator 
expression {ge) to the generator variable (a:): 

[.be I a: ^ ge'] = let a: = ge in 

Both the introduction of vectorised functions and the removal of array compre- 
hensions makes use of the lifting transformation £|-]', which does most of the 
actual work of vectorising code. An application £|e]”® of the lifting transfor- 
mation lifts the expression e into vector space within the vectorisation context 
vs. This essentially means that all scalar operations are turned into array oper- 
ations and all array operations are adjusted to work on irregular collections of 
arrays. The vectorisation context contains all those free variables of the lifted 
expression that are already lifted. The transformation function £|-]' operates 
syntax-oriented as described in the following. 

Constant values. Constant values are lifted by generating an array that has 
the same length as the arrays bound to the variables in the vectorisation context: 



= replicateP [lengthP v) c 



Variables. We distinguish two cases when lifting variables: (1) If the variable is 
in the vectorisation context, it is already lifted as it was bound by the generator 
of a comprehension or is a function argument; (2) otherwise, we need to vectorise 
the value represented by the variable by treating it like a constant value: 

I w G [v \ vs) = w 

I otherwise = replicateP {lengthP v) w 



Bindings. Lifting of let bindings is straight forward. The only interesting aspect 
is that we need to include the variable name of the bound variable into the 
vectorisation context for the body expression. 

£|let u = ei in 62]”® = let w = £|ei]’'® in 



Function application. Lifting of function applications depends on whether 
the applied function is already a vectorised function — i.e., whether the name 
is of the form /b In the case where the applied function is not vectorised, we 
replace the function by its vectorised variant and lift the function arguments. 
More precisely, we rewrite Cff Ci • • • to f'' (£|ei]”®) • • • (£|e„]’"*). 

Much more interesting is, however, the case where the applied function is 
already vectorised. In fact, the treatment of this case is one of the central points 
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of the flattening transformation, so we need to look into it more closely. Let us, for 
a moment, pretend that we are dealing with list operations. We mentioned earlier 
that vectorising a function is like mapping the function over an array. Lifting 
an already vectorised function is therefore like mapping a mapped function. 
Moreover, we know that the following equation holds for every function / and 
every list xs\ 

concat o map (map f) = map f o concat (1) 

This equation implies that, in some situations, an expression mapping a mapped 
function can be replaced by a simple mapping. To exploit this property for lifting 
of function applications, we also need to consider the function 

segment :: [[a]] ^ [/3] ^ [[/3]] 

which extracts the nesting structure from a nested list and applies this structure 
to another list, such that the following holds:® 

segment xs o coneat % xs = xs (2) 

For example, we have segment [[’a’, ’b’], Q,[’c’]] [1, 2, 3] = [[1, 2], [], [3]]. 

By combining the two equations, we can derive an alternative implementation 
for mapping a mapped function as follows: 

map (map f) $ xs 
= {Equation (2)} 

segment xs o concat o map (map f) $ xs 
= {Equation (1)} 

segment xs o map f o concat $ xs 

What exactly does this mean for the lifting of function applications? It means 
that instead of having to vectorise vectorised functions, we can do with plain 
vectorised functions if we use concatP to strip a nesting level of all function 
arguments and replace this nesting level by using segmentP on the result of the 
function application. Hence, we overall get the following transformation rule for 
lifting function applications: 

Cff Cl • • • I / is not vectorised = f'' el\ ■ ■ ■ eln 
I otherwise = 

segmentP eh (f (concatP eh) ■ ■ ■ (concatP eln)) 

where 

eh = Clar 

eln = ClcnT 

® In Haskell, the dollar operator ($) denotes function application with very low oper- 
ator precedence. 
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There is one caveat, however. Since the transformation introduces concatP and 
segmentP operations, we have to ensure that they are efficient. As we will see 
later, by choosing an appropriate representation for nested array, it is possible 
for the two operations to execute in constant time. 

Conditionals. To lift a conditional, we have to determine, for each array el- 
ement, to which branch it belongs and, based on this, construct different versions 
of each array for the two branches. The function :: [.a + /?:] ^ [.Boob] 

computes a flag array for an array of sum types a + (3, such that the flag array 
contains True at each position that corresponds to a value of type a and False 
at all the other positions in the array {getInrFlags computes the inverse of this 
array) . 

Moreover, the function packP :: [.Boob] — *■ [:a:] — *■ [:a:] drops all elements of 
an array that correspond to a False entry in the flag vector. For example, we 
have packP [False, True, False] [1, 2, 3] = [2]. Finally, the function 

combineP :: [Boob] [:a:] ^ [a] [:a:] 

combines two arrays based on a flag vector. For example, we have 

combineP [False, True, False] [2] [1, 3] = [1, 2, 3] 

Based on these functions, we can lift case constructs as follows: 

£|case e of <i?;i — > ei; >V2 62]”® = 

let 

e' = £[el- 

Iflags = getlnlFlags e' 

rflags = getInrFlags e' 

e\ = / (packP Iflags fi)] 

62 = {£le 2 j'"^'‘"'‘)[v 2 / {packP rflags V 2 )] 

in 

combineP Iflags e[ e '2 

Here the notation ei[ 2 :/e 2 ] means to substitute all occurrences of x in e\ by 62 . 

3.4 Vectorising the Sparse Matrix Vector Multiplication 

Next, let us use the previously introduced transformation rules to vectorise the 
smvm program. We start from the following definition of the sparse matrix vector 
multiplication: 



type SparseRow = [{Int, Float)] 
type SparseMatix = [SparseRow] 
type Vector = [Float] 
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smvm :: SparseMatrix Vector Vector 

smvm sm vec = 

[:sumF [: (snd ex) * (vec !: (fst ex)) | ex <— row] \ row <— sm] 

We begin the code transformation by lifting the inner array comprehension. The 
calculation is as follows: 

[.(snd ex) * (vec !: (fst ex)) \ ex <— row] 

= {Replacing array comprehensions & substituting row for ex} 

Cl(snd row) * (vec !: (fst 
= {Lift function application} 

(Cfsnd (£|?;ec !: (fst row))][™’"l) 

= {Lift function application and variables} 

(snd^ row) (£|?;ec][™’"]) !:^ (Cffst row]t™“'l) 

= {Lift function application, bound variables, and free variables} 

(snd’^ row) ((replicateP (lengthP row) vec) !:^ (fst’^ row)) 

We successfully replaced the array comprehension by the use of vectorised func- 
tions, but unfortunately there is a serious problem with this code. The expression 
replicateP (lengthP row) vec produces, for each element of the array row, a com- 
plete copy of vec — this is definitely not what we want! It is in situations like this 
that we need additional optimisation rules, as previously mentioned. The index 
operation, when lifted and applied to a replicated array, can be replaced by a 
reverse permutation operation that honours the following equation: 

(replicateP xs (lengthP inds)) !:^ inds = bpermuteP xs inds (3) 

For example, we have bpermuteP [1, 2, 3] [0, 1, 0, 2] = [1, 2, 1, 3]. The func- 
tion bpermuteP enjoys another nice property in the context of lifting. This prop- 
erty essentially states that, instead of applying a lifted backpermute operation 
to a replicated value, we can also do with the simple backpermute: 

bpermuteP^ (replicateP (lengthP inds) xs) inds = 

segmentP inds (bpermuteP xs (concatP inds)) (4) 

We can now continue vectorisation of smvm by removing the outer list com- 
prehension (where we already insert the result from the calculation that vec- 
torised the inner comprehension): 

[sumP ((snd’^ row) ((replicateP (lengthP row) vec) !:^ (fst’^ row))) 

I row <— sm] 

= {Equation (3)} 

[sumP ((snd’^ row) (bpermuteP vec (fst’^ row))) \ row <— sm] 

= {Replacing array comprehension & substituting sm for row} 

ClsumP ((snd’^ sm) (bpermuteP vec (fst’^ 

= {Lift function application} 

sumP^ (C\(snd^ sm) (bpermuteP vec (fst’^ 
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= {Lift function application (x2) and bound variable; Equation (2)} 
sumP^ o segmentP sm $ 

(snd^ {concatP sm)) *1 {concatP {C\bpermuteP vec (fst^ sm)]!*™!)) 

= {Lift function application (x2) and variables; Equation (2) & (4)} 
sumP^ o segmentP sm $ 

{snd} {concatP sm)) *1 bpermuteP vec {fst^ {concatP sm)) 

Overall, the vectorisation transformation leads us to the following version of 
smvm, which has all array comprehensions removed: 

smvm sm vec = sumP^ o segmentP sm $ 

{snd^ {concatP sm)) bpermuteP vec {fst^ {concatP sm)) 

We will see in the next subsection that a suitable choice of representation for 
nested arrays provides us with implementations for segmentP, concatP, fst\ 
and snd’^ whose runtime complexity is constant. Moreover, in Section 5, we will 
see how the application of sumP, (*^), and bpermuteP can be translated into a 
single nested loop over unboxed arrays by means of a set of array fusion rules. 

3.5 Array Representation 

We mentioned before that collection-oriented array operations, and in particular 
lifted operations, are executed most efficiently on flat array structures. Moreover, 
we concluded in the previous subsection that operations that alter the nesting 
structure of arrays, such as segmentP and concatP, need to execute in constant 
time. The flattening transformation achieves both goals by decomposing complex 
data structures into the primitive data values contained in a structure and the 
information that determines the structure of that data. The exact rules of this 
representation transformation are discussed in Section 4, but we like to sketch 
the basic idea here and illustrate it at the example of the sparse matrix vector 
multiplication. 

The array representation transformation proceeds by choosing an array rep- 
resentation in dependence on the element type stored in an array. The simplest 
case is that of arrays of values of unit type, which can simply be represented 
by the length of the array — such arrays do not contain any other information. 
Moreover, arrays of primitive types can be directly represented as unboxed ar- 
rays. Slightly more interesting are arrays of pairs, which we represent as a pair 
of arrays of equal length. The representation of arrays of sum type requires, 
in addition to two arrays containing the component values, a flag array that 
indicates for each array element to which alternative of the sum the element 
belongs; this is exactly the flag array that the function getlnlFlags , mentioned 
in the vectorisation rule for the case construct, returns. Finally, nested arrays 
are represented by a flat array that collects the elements of all subarrays plus a 
so-called segment descriptor, which contains the length of all subarrays.® With 



In fact, to improve the efficiency of some array operations, concrete implementations 
of segment descriptors usually contain more information than just the lengths of 
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this representation for nested arrays, concatP and segmentP correspond to the 
removal and addition of a segment descriptor, respectively. As this does not re- 
quire a traversal of the structure itself, these operations have indeed constant 
time complexity. 

To illustrate the representation transformation, let us look at the represen- 
tation of sparse matrices: 

type SparseRow = [.{Int, Float)'] 
type SparseMatrix = [.SparseRow] 

Since SparseMatrix is a nested array, it is represented by a pair consisting 
of a segment descriptor and the representation type of SparseRow, which in 
turn, is an array of pairs, and will therefore be represented by a pair of type 
{[Int], [Float']). Overall, the matrix 

■5 0 0 O' 

0 0 0 7 
34 0 0 
0 0 0 0 

corresponds to the structure 

([:!, 1, 2, 0:], — (simplified) segment descriptor 

([:0, 3, 0, 1:], — position of the values in each subarray 

[:5, 7, 3, 4:])) — values of non-zero elements 

which consists only of flat arrays that have an immediate unboxed representation. 
Interestingly, this transformation can even be applied to recursive occurrences 
of array types. The usefulness of such a representation transformation for arrays 
is repeatedly mentioned in the literature [BS90,PP93,HM95,CPS98,CK00]. 

4 A Representation Transformation for Parallel Arrays 

In this section, we will formalise the representation transformation that was in- 
formally introduced at the end of the previous section. Guided by the element 
type of an array, the transformation separates structural information from the 
primitive data values contained in a structure. The result is an array represen- 
tation that minimises the occurrence of pointers in arrays, in favour of the use 
of unboxed arrays. 

4.1 Parallel Arrays as a Polytypic Type 

In generic programming, a type constructor whose concrete representation de- 
pends on its type argument is called a type-indexed type or polytypic type. The 



all the subarrays. However, to keep the presentation simple, we ignore this for the 
moment. 
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type [: Unit-] 

[:p:] I p £ {Int, Float, . . .} 
[:n T 2 :] 

[:ri T 2 :] 



Int 

UArr p 



Sel [:t 2 :] 

Segd [:r:] 



[:n:] [.T2] 



unboxed basic array 



type Sel 
type Segd 



[.Bool-] 

[Int] 



— selector 

— segment descriptor 



FigureT. Polytypic definition of parallel arrays 



choice of representation for a polytypic type naturally affects the implementa- 
tion of functions operating on that type, too. Such functions, whose concrete 
implementation depends on a type argument, are called type-index functions or 
polytypic functions. Due to these polytypic types and functions, generic pro- 
gramming achieves extra generality on both the value and the type level. Harper 
& Morrisett [HM95], in their work on implementing polymorphism by way of 
intensional type analysis, realise type-indexed types and functions by a typecase 
construct; i.e., by type and value expressions that choose between a number of 
alternatives on the basis of a type argument. In the case of parallel arrays, this 
type argument would be the element type of the array. Hinze [HinOO] suggests 
an alternative implementation scheme based on the compile-time specialisation 
of all polytypic functions and types, which is partly related to the dictionary- 
passing implementation of Haskell type classes. 

The elimination of pointers, and hence boxed data, from array elements con- 
stitutes a representation transformation, which we can specify by defining the 
array type [: • :] as a type-indexed type [CKOO] . This type-indexed type inspects 
its type index — the element type of the parallel array — by way of a typecase and 
makes its concrete representation dependent on the structure of the type index. 
In other words, we usually regard parametric type constructors as free functions 
over types; typecase allows us to define type constructors that implement a more 
sophisticated mapping. 

Type-indexed definitions determine types and values by matching the type- 
index against a range of elementary type constructors; in particular, they distin- 
guish the unit type, basic types (such as. Char, Int, Float, etc.), binary products, 
binary sums, the function space constructor, and in our case, also the parallel 
array constructor. The latter needs to be considered for nested arrays, as the 
representation of array nesting is independent of the concrete representation of 
the inner array. 

Figure 7 displays a polytypic definition of parallel arrays, which selects a 
concrete array representation in dependence on the array element type. Here 
Unit, (;*;), and (:-l-:) are the type constructors representing units, products, and 
sums in generic definitions.^ Unit arrays are simply represented by their length. 

^ These are from GHC’s Data. Generics module. 
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Arrays of pairs are represented by pairs of arrays. Arrays of sums get an extra 
selector component Sel that determines for each element in which of the two 
flattened subarrays the data resides. Finally, nested arrays are represented by a 
segment descriptor Segd together with a flattened array. 



4.2 Polytypic Types as Multi-parameter Type Classes 

Given that we are aiming at an implementation in Haskell, the following ques- 
tion arises: how can we implement the representation transformation encoded 
in the polytypic array definition of Figure 7 in Haskell? Moreover, we need to 
define the basic operations on parallel arrays as type-indexed functions that 
adapt their behaviour to the representation type in dependence on the type in- 
dex. A heavy-weight solution would be to extend the language with a typecase 
construct, as for example done in the TILT compiler [TMC+96]. An alterna- 
tive, implementation-intensive approach would be implement general compile- 
time specialised polytypic functions and types, as done in the Generic Haskell 
system [GL02]. However, in the interest of simplicity, we prefer to avoid sophis- 
ticated language extensions. 

Gheney & Hinze [GH02] recently demonstrated that polytypic functions can 
also be implemented in Haskell 98 extended with existential types by encoding 
representation types on the value level and using type classes to infer represen- 
tations for user-defined data types. This is a flexible and general approach to 
generic programming, but it comes at the expense of extra runtime costs, as the 
presence of existential types preempts some central optimisations performed by 
a compiler like GHG. An alternative implementation of polytypism in Haskell 
by way of a slight language extension has been proposed in [HPOl]. However, 
the proposal in its current form comes with a number of limitations that make 
it unsuitable for our purposes. In particular, it currently neither supports multi- 
parameter type classes nor generic types of a kind other than *. 

As pointed out in [HJL02] , multi-parameter type classes in combination with 
functional dependencies [JonOO] can be coerced into implementing some forms of 
type-indexed types. Generally, this approach has its limitations: firstly, it works 
only for type constructors of some specific kinds and, secondly, the marshalling 
between user-defined data types and the product-sum representation used in 
polytypic definitions has to be manually coded for all instances. The first re- 
striction does not affect us, as parallel arrays belong to the polytypic types that 
can be defined via multi-parameter type classes; and we will see in Subsection 4.6 
that we can work around the second restriction to some extent. 

In the following, we will explore the encoding of type-indexed data types 
by way of multi-parameter type classes with functional dependencies. We start 
by discussing the concrete type mapping needed to implement parallel arrays. 
Afterwards, we shall have a look at a general approach to overcome the require- 
ment for manually coded instances for the type classes that encode type-indexed 
types. 
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4.3 The Concrete Implementation of the Type Mapping 

A type-indexed data type can be implemented by a binary type class that relates 
the type index to the representation type [HJL02]. In other words, to define a 
type-indexed type TI of kind k with type index r, namely TI{t :: ■*;) :: k, we 
introduce a type class 

class TI T p I T ^ p 

where p :: k. The type class TI essentially denotes a mapping from types r * to 
types p \ : K and, hence, encodes a type-dependent representation transformation. 
This mapping is populated by instance declarations for the class. More precisely, 
for each defining equation of the form TI{t) = s of the type-indexed type, we 
define a class instance 

instance TI t s 

Finally, the elementary operations on the type-indexed type TI , which need to 
be type-indexed functions, are implemented as methods of the class TI. 
Following this scheme, we can implement [: • :] using a type class 

class P Array e arr \ e arr, arr —>■ e 

PArray e arr is a bijection between array element types e and array represen- 
tation types arr for those elements. In other words, e is a type index and arr 
the representation type for arrays that contain values of the type index as ele- 
ments. The dependence of the representation type on the type of array elements 
is captured by the functional dependency e ^ arr, as in the scheme illustrated 
previously by TI. Interestingly, we also need to establish a functional depen- 
dency in the opposite direction; i.e., the dependence arr e. This has serious 
implications for the generality of the definition of PArray, we shall return to 
the reasons for this functional dependency as well as its implications in the next 
subsection. 

Given the PArray class declaration, we need to fix the representation types 
and associate them with the corresponding element types by way of instance 
declarations. For example, for arrays of unit values, where it suffices to store the 
length of the array (as all values are the same anyway), we can use 

newtype PA Unit = PA Unit Int — length of unit array 
instance PArray Unit PA Unit 

This instance declaration corresponds to the first equation of Figure 7. Moreover, 
arrays of products can be defined as follows: 

data PAProd I r = PAProd I r — array of pairs as pair of arrays 
instance [PArray a aarr, PArray b barr) 

PArray [a b) [PAProd aarr barr) 
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Arrays of basic types have a direct representation as unboxed arrays. Let us 
assume the existence of a type constructor UArr that may be parametrised with 
a range of basic types; i.e., UArr Int denotes arrays of unboxed integer values. 
With this, we can define a PArray instance for basic types as 

newtype PAPrim e = PAPrim ( UArr e) 

instance PArray Int {PAPrim Int) — etc for other basic types 

Before completing the set of required instances, let us step back for a moment 
and consider the elementary operations that we need to define as class methods 
of PArray. 

4.4 Immutable Versus Mutable Arrays 

We have seen in Section 2 that for optimal performance, array codes need to be 
implemented with mutable arrays in the ST monad. Our goal here is to define 
PArray such that array algorithms defined in terms of PArray, after inlining and 
similar optimisations, expand into ST monad-based code. Nevertheless, the user- 
level interface of the library should remain purely functional. Hence, we may use 
mutable arrays internally, but must convert them into immutable arrays once 
they are fully defined and returned to user-level code. 

Overall, most array-producing operations proceed as follows: 

1. Allocate a mutable array of sufficient size. 

2. Populate the mutable array with elements using a loop that executes within 
the ST monad. 

3. Coerce the fully defined mutable array into an immutable array. 

Provided that the mutable array is not altered anymore, the last step can (un- 
safely) coerce the type without actually copying the array. 

A detailed introduction to the ST monad, including the outlined strategy 
for implementing immutable by mutable arrays, is provided by Peyton Jones & 
Launchbury [PL95] . Here we will constrain ourselves to the example displayed in 
Figure 8. The code defines the function replicateU , which produces an unboxed 
array of given size where all elements are initialised to the same value. To un- 
derstand the details, we need to have a look at the interface to unboxed arrays, 
which defines the abstract data type UArr e for immutable, unboxed arrays of 
element type e and MUArr s e for mutable, unboxed arrays, where s is the state 
type needed for mutable structures in the ST monad. The type constraint that 
enforces that unboxed arrays can only contain basic types is implemented by 
the type class UAE (which stands for “unboxed array element”). On immutable 
arrays, we have the following basic operations: 

lengthU :: UAE e => UArr e Int 

indexU :: UAE e => UArr e Int e 

These two functions obtain an array’s length and extract elements, respectively. 
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replicateU :: UAE e => Int — > e — » UArr e 
replicateU n e = 
runST {do 

ma <— newMU n 
fillO ma 

unsafeFreezeMU ma n) 

where 

fillO ma = fill 0 

where 

fill off \ off == n= return () 

I otherwise = do 

writeMU ma off e 
fill (off + 1) 



Figures. Typical use of mutable arrays to define an immutable array 



On mutable arrays, we have 



lengthMU : 


UAE e 




MUArr s e 




— > Int 


newMU : 


UAE e 




Int 




— *■ ST s {MUArr s e) 


readMU 


UAE e 




MUArr s e - 


Int 


— *■ ST s e 


writeMU : 


UAE e 




MUArr s e - 


Int - 


^ e ^ STs{) 



to create new arrays as well as index and update them. Finally, we can convert 
mutable into immutable arrays with 

unsafeFreezeMU :: UAE e MUArr s e Int — > ST s {UArr e) 

where the second argument provides the length of the immutable array. This may 
be less than the length with which the mutable array was originally allocated. 
The structure of this interface was inspired by the unboxed array support of 
GHC’s extension libraries [HCL02]. 



4.5 Prom Representation Types to Representation Constructors 

Given that we need to handle mutable and immutable arrays, we need to rethink 
the instance declarations for PArray provided in Section 4.3. If we use 

newtype PAPrim e = PAPrim ( UArr e) 
instance PArray e {PAPrim e) 

as proposed before, the PArray class is restricted to immutable arrays, as the 
type constructor UArr appears explicitly in the definition of PAPrim. As a 
consequence, we have to define a second, structurally identical class with almost 
identical instance declarations for mutable arrays; i.e., instance declarations for 
types that replace UArr by MUArr. Such duplication of code is obviously not 
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type PArr {arr :: * ^ e = arr e UArr 

type MPArr s (arr :: * —»(* ^ *)—>*) e = arr e {MUArr s) 

— Operations that apply to all parallel arrays 
class PArray e arr \ e —> arr, arr e where 

lengthP :: PArr arr e — > Int 

— Yield the length of a parallel array (if segmented, number of segments) 
indexP :: PArr arr e — » Int — » e 

— Extract an element out of an immutable parallel array 
sliceP :: PArr arr e — > Int Int — > PArr arr e 

— Extract a slice out of an immutable parallel array 

— Operations that apply only to flat parallel arrays 
class PArray e arr => FArray e arr where 

newMP :: Int ST s (MPArr s arr e) 

— Allocate a mutable parallel array of given length 
writeMP :: MPArr s arr e — > Int e ^ ST s () 

— Update an element in a mutable parallel array 
unsafeFreezeMP :: MPArr s arr e — » Int ST s (PArr arr e) 

— Convert a mutable into an immutable parallel array 



Figure9. Dehnition of the parallel array classes 



desirable. A more elegant solution is to parametrise the definition of the PArray 
instances with the type constructor of the base array. 

In other words, we make PArray e arr into a bijection between element types 
e :: 7k- and array representation constructors arr ::*—>(*—>*)—!• 7k-. The repre- 
sentation constructor gets two arguments: (1) an array element type of kind * 
and (2) a type constructor specifying the base array, which is of kind -k -k. The 
main reason for passing the array element type is to avoid ambiguities in type 
signatures, which we will discuss in more detail later. The type constructor for 
the base array can be either UArr or MUArr s and it determines whether we 
obtain a mutable or an immutable array representation. 

Figure 9 introduces two type synonyms PArr and MPArr to simplify the use 
of these generalised array constructors. Moreover, it contains the complete class 
definition of PArray including all the elementary functions of the class. Given the 
previous discussion of the interface for unboxed arrays, most of these functions 
should be self-explanatory. The only new function is sliceP, which extracts a 
subarray, specified by its start index and length, from an immutable array. 

In addition, the class PArray is split into two classes. PArray itself defines 
the type-indexed operations that apply to all forms of parallel arrays, whereas 
FArray defines those type-indexed operations that apply only to fiat arrays; 
i.e., to parallel arrays that are not segmented. We will discuss the reason for 
this distinction in more detail below, in conjunction with the representation of 
nested arrays. 
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In the following, we discuss the concrete implementation of the individual 
equations defining [: • :] in Figure 7 by means of instance declarations of the class 
P Array. In all instances, the concrete element type e will be used as a phantom 
type; i.e., as a type argument to the array representation constructor that is not 
used on the right-hand side of the constructor’s type definition. The purpose 
of this type argument is to ensure that the concrete element type appears in 
signatures for functions that map arrays to arrays, such as sliceP, and would 
otherwise not mention the element type. The presence of the element type avoids 
ambiguities that would otherwise arise during type checking of overloaded array 
operations. 

To keep the presentation reasonably compact, we will not discuss the defini- 
tion of the methods for the various instances. However, Appendix A describes 
how to obtain a library implementation that covers all details. 

Arrays of units. In this new setting, the instance declaration for Unit reads 

newtype PAUnit e {ua :: *—*■*) = PAUnit Int 

instance PArray Unit PA Unit — also for FArray 

For reasons of uniformity, PAUnit is parametrised over the base array type ita; 
although ua is not used on the right-hand side of the definition, as we only store 
the length of arrays of units. 



Arrays of primitives. In contrast to arrays of units, the definition for primitive 
types makes use of the base array type: 

newtype PAPrim r e {ua = PAPrim {ua r) 

instance PArray Char {PAPrim Char) — also for FArray 
instance PArray Int {PAPrim Int) — also for FArray 



The constructor of base arrays, ua, is applied to the primitive representation 
type r. Instances are provided for all types in UAE. 

Arrays of products. More interesting is the case of products: 

data PAProd I r e {ua :: * ^ = forall le re. PAProd {I le ua) {r re ua) 

instance {PArray a aarr, PArray b barr) => 

PArray {a :*: b) {PAProd aarr barr) — also for FArray 

Here the base array type ua is passed down into the component arrays construc- 
tors I and r, which are being obtained from the type context in the instance 
declaration. The base array type will finally be used when the component arrays 
contain elements of primitive type. The use of the existential types le and re,® 

® Due to the covariance of functions, existentials are introduced by universal quantifi- 
cation denoted forall. 
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which represent the components of the concrete product type, may be surpris- 
ing. These existentials are necessary as place holders, as we have no means to 
decompose the type e into its components. 



Arrays of sums. We treat sums similar to products, but, in addition to the 
component arrays, we also provide a selector array (see Figure 7), which is 
parametrised with the base array ua in the same way as the component arrays 
are. 



data PASum I r e {ua :: * — > *) = forall le re. 

PASum {Sel ua) {I le ua) (r re ua) 
instance {PArray a aarr, PArray b barr) 

P Array {a b) {PASum aarr barr) — also for PArray 



Arrays of arrays. As outlined in Section 3.5, we represent nested arrays by a 
flat array combined with an extra structure, a segment descriptor, that encodes 
the partitioning of the flat array into subarrays. Hence, we have a concrete 
representation and instance as follows: 

data PAPArr arr e {ua :: * ^ = forall e'. PAPArr {Segd ua) {arr ua) 

instance PArray e arr PArray {PArr arr) {PAPArr arr) 

In contrast to all the previous cases, we cannot provide an PArray instance 
for PAPArr. This is essentially as the operations newMP and writeMP of the 
PArray class need a more complex signature to deal with array segmentation. For 
example, the length of the segmented array (i.e., the number of segments) is not 
sufficient for newMP to allocate a segmented array structure, as we also need to 
know the total number of elements across all segments to determine the storage 
requirements. Moreover, we cannot simply pass the total number of elements to 
newMP as this does not place an upper bound on the storage requirements of 
the segment descriptor; after all, there may be an arbitrarily large number of 
empty segments. Hence, we need to introduce a more complex operation 

newMSP :: PArray r arr Int —f Int —t ST s {MSPArr s arr e) 

that receives both the number of segments as well as the total number of elements 
across all segments to allocate a segmented array. Similarly, we provide 

nextMSP :: PArray r arr 

^ MSPArr s arr e Int Maybe r ST s {) 

as a replacement for writeMP . If the third argument to nextMSP is Nothing, a 
new segment (working from left to right) is being initialised. All following calls 
to nextMSP write to the new segment. Alternatively, segmented arrays can be 
created by constructing a flat array first, and then combining it with a segment 
descriptor. 
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The big picture. It is worthwhile to reflect some more on the generalisation 
of PArray that allows us to cover immutable and mutable array representations 
with a single type mapping. The essential point is the change of the array repre- 
sentation type arr, in the mapping PArray e arr, from being a manifest type of 
kind * to becoming a type constructor of kind *—>(*—!■*)—!■*, and hence from 
being an array representation type to being an array representation constructor. 
With this change, an element type no longer implies an array representation, 
but instead a blueprint that enables the construction of an array representation 
from a type of base arrays. For example, the element type {Int Float) maps 
to the type constructor PAProd {PAPrim Int) {PAPrim Float). If this type con- 
structor is applied to UArr, we get the representation type for immutable arrays 
of pairs of integer and float values; however, if it is applied to MUArr s, we get 
the representation for mutable arrays of pairs of integer and float values. Hence, 
PAProd {PAPrim Int) (PAPrimFloat) encodes the type structure for flattened 
arrays of integer/float pairs, while leaving the type of the underlying unboxed 
arrays unspecified. 

As a consequence, the bijection enforced by the functional dependencies in 
the type class PArray is not violated, even though a single element type relates 
to the type of both a mutable and an immutable array representation. On the 
level of the array representation constructor, the mapping is still one-to-one. 

4.6 Embedding Projection Pairs 

In the encoding of polytypic arrays into type classes, we only provided instances 
for elementary type constructors (products, sums, and so on). Obviously, this by 
itself is not sufficient to use arrays of user-defined data types, despite the earlier 
claim that all algebraic data types can be represented by a combination of these 
elementary type constructors. In fact, the situation is worse: given our definition 
of the class PArray, it is impossible to use the same array representation for two 
isomorphic element types. In other words, we cannot represent [:():] and [.Unit'] 
in the same way. 

The reason is the bijection required by the functional dependencies in 

class PArray e arr \ e arr, arr — > e 

The dependency arr — *■ e asserts that any given array representation is used for 
exactly one element type. Hence, PA Unit can only be used to represent either 
[:():] or [Unit], but not both. This naturally leads to the question of whether 
the dependency arr — > e is really needed. The answer is yes. If the dependency 
is omitted, some class methods do not type check for recursive types, such as 
products and sums. The exploration of the precise reason for this is left as 
an exercise, which requires to study the library implementation referenced in 
Appendix A. 

Fortunately, we can regain the ability to use a single array representation for 
multiple element types via a different route. The idea is to regard the element 
type of PArray merely as the representation type of the actual element type. 
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We relate the actual element type with its representation type via another type 
class that defines a so-called embedding projection pair (or EP, for short), which 
is a pair of functions, from and to, that map elements from the actual to the 
representation type and back. So, we have 

class EP t r I t > r where 
from r. t ^ r 
to :: r ^ t 

where t is the actual type and r its representation type. Note that we have a 
functional dependency from the actual to the concrete type only. This implies 
that for each actual type, we uniquely define a representation type. However, 
as we do not have a functional dependency in the opposite direction, a single 
representation type may represent many actual types. In particular, we have, as 
the simplest cases, 

instance EP Unit Unit where 
from = id 
to = id 

instance!?/’ () {/nit where 
from 0 = Unit 
to Unit = 0 

Equipped with EP , we can now use PArray for a wide range of element types 
by combining the two type classes in a type context (EP e r, PArray r arr). This 
may be read as the actual element type e is represented by the representation 
type r, which in turn uniquely maps on an array representation arr; as illustrated 
in the following diagram: 

e »■ r » * arr 

EP PArray 

In other words, we use EP to convert between user-level data types and 
canonical representations as product-sum types. Arrays are, then, defined over 
the canonical representation. 

5 Array Fusion 

So far, we have not discussed the third transformation listed in Section 3.2. 
Array programs that are expressed by means of collective array combinators 
have a tendency to produce intermediate arrays that are immediately consumed 
by the operation following their generation. For example, the vectorised version 
of smvm, from Section 3.4, essentially constitutes a three stage pipeline of array 
traversals. The stages are implemented by bpermuteP , zipWithP (*), and sumSP. 

Such a decomposition of complex computations into pipeline stages makes 
the code more readable, but it also limits performance. The intermediate ar- 
rays consume resources and affect cache locality negatively. In [CKOl], we in- 
troduce an approach to equational array fusion that is based in GHC’s rewrite 
rules [PHTOl]. In the following, we will provide an overview of this approach. 
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5.1 Loops 

There exists a plethora of array combinators; so, any attempt to consider the 
fusion of all possible pairs of combinators would lead to an unmanageable num- 
ber of transformation rules. Hence, as in the case of list fusion (i.e., shortcut 
deforestation), we need a small set of elementary combinators that serve as 
building blocks for the others; in the case of lists, these combinators are build 
and foldr [GLP93]. Then, all that remains is to define fusion for these elemen- 
tary combinators. In combination with inlining, this also takes care of the non- 
elementary combinators that are defined in terms of the elementary ones. 

As in the case of build and foldr for lists, we need an array constructing and 
an array consuming function. As the constructing function, we use 

replicateP :: {EP e r, FArray r arr) Int — > e — *■ PArr arr e 

It generates an array of given length where all elements are initialised to the 
same value. However, the array consuming function is more involved. As is the 
case of foldr for lists, we require that the function can describe both mapping 
operations as well as reductions. Moreover, it needs to be able to deal with a 
running accumulator. All this functionality is integrated into a generalised loop 
combinator: 

loopP :: (EP e r, P Array r arr, EP e' r' , FArrayr' arr') 

^ {acc — > e ^ (acc. Maybe e')) — mapping & folding elements 

— > acc — initial accumulator value 

— > PArr arr e — consumed array 

— > (PArr arr' e' , acc) 

The versatility of loopP becomes clear when considering the implementation of 
mapping, reduction, prescan, and filtering in Figure 10. Moreover, the defini- 
tion of cnumFromToP (in the same figure) demonstrates how loopP can im- 
plement complex generators when combined with replicateP . This particular 
combination of loopP and replicateP may appear wasteful due to the intermedi- 
ate array generated by replicateP . However, the representation transformation 
from Section 4.3 assigns to arrays of unit type the concrete definition PAUnit, 
which represents unit arrays simply by their length. Thus, the array created by 
replicateP (to — from + 1) Unit is represented by nothing other than a plain 
number, which serves as an upper bound for the iteration encoded in loopP. 

5.2 Fusion Rules 

In the following, we denote rewrite rules as follows: 

(rule name) Vwi ... expi exp 2 

where the Vi are the free variables in the rules. These rules should be read as 
replace every occurrence of expi by exp 2 - GHC permits to embed such rules 
directly into library code [PHTOl]. 
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mapP :: {EP e r, PArray r arr, EP e' r' , PArray r' arr') 

=^> (e — » e') — > PArr arr e — » PArr arr' e' 
mapP f = fst o loopP {X-E —> {Unit, Just$f e)) Unit 

foldlP :: {EP a r, PArray r arr) =>(6^a— >6)— >6— > PArr arr a b 

foldlP f z = snd o loopP {Xa e — > {f e a, Nothing)) z 

scanlP :: {EP a r, PArray r arr) =>(6^a— >6)— >6— > PArr arr a — > arr 

scanlP f z = fst o loopP {Xa e —> {f e a, Just a)) z 

filterP :: {EP e r, FArray r arr) => (e ^ Bool) —> PArr arr e PArr arr e 
filterP p = fst o loopP (A_ e — > ( Unit, if p e then Just e else Nothing)) Unit 

enumFromToP :: {Enum e, EP e r, FArray r arr) ^ e — » e — > PArr arr e 

enumFromToP from to = 

fst o loopP {Xa _ — » {succ a. Just a)) from o replicateP {to — from + 1) $ Unit 



FigurelO. Standard combinators in terms of loopP 



The first and simplest fusion rule encodes an optimisation that is related 
to the discussion of enumFromToP from Figure 10 in the previous subsection. 
We can transform any occurrence of a replicateP followed by a loopP into a 
modified loopP / replicateP combination where replicateP only produces a unit 
array, which effectively eliminates the overhead of the intermediate array: 

(loopP /replicateP fusion) V mf start n e . 
loopP mf start {replicateP n e) 
loopP {Xacc _ mf ace e) start {replicateP n Unit) 

Fusion of two consecutive loops is more involved. It requires to combine the 
two mutator functions (first argument to loopP) into one. This is achieved by 
the following rule: 

(loopP/loopP fusion) V mfi starti m /2 start 2 arr. 
loopP mfi starti {loop Arr {loopP mf\ starti arr)) 1 — > 

let 

mf {acci, acci) e = 
case mfi e acci of 

{acc'i. Nothing) — > {{acc[, acCi), Nothing) 

{acc'i. Just e') — > case m /2 e' acCi of 

(acc 2 , res) {{acc[, acc^), res) 

in 

loopSndAcc {loopP mf {starti, starti) arr) 

The accumulator of the combined loop maintains the two components of the orig- 
inal loops as a product and sequences the two mutators. The function loopSndAcc 
drops the accumulator result that corresponds to first loop. 
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loopSndAcc :: {arr, (acci, acc 2 )) ^ (arr, acc 2 ) 

loopSndAcc {arr , (acci, acc 2 )) = {arr, acc^) 

Further rules are needed to handle loops separated by zipP as well as loops 
over segmented operations. For details, see [CKOl]. 

6 Concluding Remarks 

We discussed an approach to purely functional array programming that uses 
program transformation to turn elegant high-level code into efficient low-level 
implementations. In particular, we covered code vectorisation in combination 
with an array representation transformation based on techniques from generic 
programming. The resulting code is much faster than what can be achieved with 
Haskell’s standard array library and, for simple examples, comes within a factor 
of 3-4 within the performance of C programs. 

We have omitted concrete benchmarks figures for parallel arrays on purpose. 
The appendix contains a set of programming exercises, which include the imple- 
mentation of the benchmarks from Section 2 using parallel arrays. We did not 
want to anticipate the solution to these exercises and instead included compar- 
ative benchmarks as part of these exercises. 

As of the writing of these lecture notes, code vectorisation has to be manually 
performed for Haskell programs. In contrast, the array representation transfor- 
mation and array fusion are automated by a library for the Glasgow Haskell 
Compiler. The appendix contains instructions on how to obtain this library as 
well as a set of programming exercises that illustrate how to use the techniques 
introduced in this text. 

Acknowledgements. We like to thank the participants of the Summer School 
on Advanced Functional Programming 2002, and especially Hal Daume HI and 
Simon Peyton Jones, for feedback on these lecture notes. 



A Exercises 

The following programming exercises reinforce the concepts and approach pre- 
sented in these lecture notes. Moreover, they include the setup needed to per- 
form comparative benchmarks that demonstrate the performance improvement 
of parallel arrays over standard Haskell arrays as well as indicate how much effi- 
ciency is sacrificed by using Haskell with parallel arrays instead of plain C code. 
The support software, installation instructions, and solutions to the exercises are 
available from the following website: 

http : //www . cse . unsw . edu . au/~chak/af p02/ 

The website also includes references to further material as well as extensions to 
what is described in these lectnre notes. 
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A.l Warm Up Exercise 

After installing the support software from the website, start by interactively 
exploring the types and functions of the parr library in GHCi. The website 
contains some exploratory exercises as well as instructions on library features 
that were added after the writing of these lecture notes. 

The high-level array computation sumP [.x * x \ x can be im- 

plemented by the following function definition: 

PArray> let sumSq : : Int -> Int = sumP . mapP (\x -> x * x) . enumFromToP 1 

PArray> sumSq 100 

338350 

This function is a nice example for the gains that fusion promises in the 
extreme case.® More about this is on the website. 

A. 2 A Simple Benchmark 

As a first benchmark consider the computation of the dot product: 



To save you the effort required to write benchmark support routines, some code 
is provided in the directory labkit/ of the support software. In particular, the 
module BenchUtils .hs contains benchmarking support and DotP.hs contains 
a skeleton for the dot-product benchmark. Just fill in the missing pieces under 
the heading “Benchmarked code”. Then, compile and link the code with the 
command at the top of DotP .hs. This command contains all the options needed 
to put GHC’s optimisation engine into high gear. Run the executable dotp to 
execute your dot-product code on vectors between 100,000 to 500,000 elements. 

If you like to compare the execution time with that of a whole range of dot 
product implementations in Haskell plus one in G, compile and run the code 
provided in the files DotProd.hs, dotprod.h, and dotprod.c. For instructions 
on how to compile and run the files, see the file headers. Note that the G function 
is called from the Haskell code via the foreign function interface; i.e., you need 
to compile the G code before compiling and linking the Haskell module. 

A. 3 Matrix Vector Multiplication 

The following definition of the multiplication of a sparse matrix with a vector is 
given the high-level array notation of Section 3.1: 

type SparseRow = [.{Int, Double)'] 
type SparseMatix = [SparseRow] 
type Vector = [Double] 

The example, in fact, appears on the first page of Wadler’s deforestation paper. 



n—1 




9 
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smvm :: SparseMatrix Vector —> Vector 

smvm sm vec = [.sumP [.x * vec !: col \ (col,x) <— row] \ row <— sm] 

Section 3.4 demonstrates how to vectorise the above, resnlting in code that can 
be implemented by way of the parallel array library. Implement the vectorised 
version of smvm by extending the file labkit/smvm/SMVM_fusion.hs, which to- 
gether with the other files in the same directory and BenchUtils .hs implements 
fonr versions of the code (two nsing Haskell arrays and two nsing parallel arrays) . 
Compile these files nsing the same optimisation options as for the dot prodnct 
benchmark. Note that the modnle SMVM_optimal .hs implements the code that 
corresponds to the sitnation where the fusion rules optimise the vectorised code 
optimally. 

Hint: Use unzipP where the vectorised code in the lecture notes uses fst^ 
and snd’^ . 



A. 4 Advanced Exercises 

Prime numbers. The following implementation of the Sieve of Eratosthenes is 
interesting as it constitutes a data parallel algorithm that parallelises both loops 
of the sieve at once: 



primes :: Int —>■ [.Int] 

primes n\ n < 2 = [::] 

I otherwise = 



let 

sqrPrimes 

sieves 

sieves' 

flags 

in 

dropP 2[.i I 



= primes {ceiling {sqrt {fromintegral n))) 

= concatP [.[.2 * p, S*p..n — 1] \ p <— sqrPrimes] 
= zipP sieves {replicateP {lengthP sieves) False) 

= bpermuteDftP n {const True) sieves' 

i ^ [:0..n - 1:] | / ^ flags, /:] 



Implement this code with the help of the parallel array library. 



Quicksort. The quicksort algorithm can be expressed in terms of array com- 
prehensions quite conveniently: 



qsort 
qsort [::] 
qsort xs 



Ord a => [:«:] ^ [:a:] 

[::] 

let 

m = xs\ \ {lengthP xs ^div^ 2) 

ss = [:s I s <— xs, s < m:] 

ms = [:s I s <— xs, s == m] 

gs = [:s I s <— xs, s > m:] 

sorted = [.qsort xs' \ xs' <— [:ss, ^s:]:] 
in 

{sorted !: 0) HP ms -P {sorted !: 1) 
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Try to vectorise this code using the scheme introduced in Section 3. Interestingly, 
vectorisation turns the tree-shaped call graph of qsort into a linear one, so that 
all invocations that are on one level of the tree are executed by one invocation 
to the vectorised variant qsort^ operating on segmented arrays. Implement the 
vectorised code using the parallel array library. This may require to implement 
some additional combinators in terms of loopP and loopSP . 
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1 Introduction 



It is a very undesirable sitnation that today’s software often contains errors. 
One motivation for nsing a fnnctional programming langnage is that it is more 
difficnlt (or even impossible) to make low-level mistakes, and it is easier to reason 
about programs. But even the most advanced functional programmers are not 
infallible; they misunderstand the properties of their own programs, or those of 
others, and so commit errors. 

We therefore aim to provide functional programmers with tools for testing and 
tracing programs. In broad terms, testing means first specifying what behaviour 
is acceptable in principle, then finding out whether behaviour in practice matches 
up to it across the input space. Tracing means first recording the internal details 
of a computation, then examining what is recorded to gain insight, to check 
hypotheses or to locate faults. Although we have emphasised the motivation of 
eliminating errors, tools for testing and tracing can often be useful even for pro- 
grammers who rarely make mistakes. For example, the increased understanding 
gained by testing and tracing can lead to improved solutions and better docu- 
mentation. 

In these lecture notes we concentrate on QuickCheck [3,4], a tool for testing 
Haskell programs, and Hat [15,2], a tool for tracing them. Each tool is useful 
in its own right but, as we shall see, they are even more useful in combination: 
testing using QuickCheck can identify failing cases, tracing using Hat can reveal 
the causes of failure. 

Section 2 explains what QuickCheck is and how to use it. Section 3 similarly 
explains Hat. Section 4 shows how to use QuickCheck and Hat in combination. 
Section 5 outlines a much larger application than those of earlier sections, and 
explains some techniques for testing and tracing more complex programs. Sec- 
tion 6 discusses related work. Section 7 details almost twenty practical exercises. 

Source programs and other materials for the examples and exercises in these 
notes can be obtained from http : //www . cs . york . ac . uk/fp/ af p02/. 
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2 Testing Programs with QuickCheck 

In this section we give a short introduction to QuickCheck^, a system for speci- 
fying and randomly testing properties of Haskell programs. 



2.1 Testing and Testable Specifications 

Testing is by far the most commonly used approach to ensuring software quality. 
It is also very labour intensive, accounting for up to 50% of the cost of software 
development. Despite anecdotal evidence that functional programs require some- 
what less testing (‘Once it type-checks, it usually works’), in practice it is still a 
major part of functional program development. 

The cost of testing motivates efforts to automate it, wholly or partly. Automatic 
testing tools enable the programmer to complete testing in a shorter time, or to 
test more thoroughly in the available time, and they make it easy to repeat tests 
after each modification to a program. 

Functional programs are well suited to automatic testing. It is generally accepted 
that pure functions are much easier to test than side-effecting ones, because one 
need not be concerned with a state before and after execution. In an imperative 
language, even if whole programs are often pure functions from input to output, 
the procedures from which they are built are usually not. Thus relatively large 
units must be tested at a time. In a functional language, pure functions abound 
(in Haskell, only computations in the 10 monad are hard to test), and so testing 
can be done at a fine grain. 

A testing tool must be able to determine whether a test is passed or failed; 
the human tester must supply a passing criterion that can be automatically 
checked. We use formal specifications for this purpose. QuickCheck comes with a 
simple domain-specific language of testable specifications which the tester uses to 
define expected properties of the functions under test. It is then checked that the 
properties hold in a large number of cases. We call these testable specifications 
properties. The specification language is embedded in Haskell using the class 
system. This means that properties are just normal Haskell functions which can 
be understood by any Haskell compiler or interpreter. Property declarations are 
either written in the same module as the functions they test, or they can be 
written in a separate Haskell module, importing the functions they test, which 
is the preferred way we use in these notes. Either way, properties serve also as 
checkable documentation of the behaviour of the code. 

A testing tool must also be able to generate test cases automatically. Quick- 
Check uses the simplest method, random testing., which competes surprisingly 
favourably with systematic methods in practice. However, it is meaningless to 
talk about random testing without discussing the distribution of test data. Ran- 
dom testing is most effective when the distribution of test data follows that 

^ Available from http://www.cs.chalmers.se/~rjmh/QuickCheck/ 
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of actual data, but when testing reusable code units as opposed to whole sys- 
tems this is not possible, since the distribution of actual data in all subsequent 
reuses is not known. A uniform distribution is often used instead, but for data 
drawn from infinite sets this is not even meaningful! In QuickCheck, distribution 
is put under the human tester’s control, by defining a test data generation lan- 
guage (also embedded in Haskell) , and a way to observe the distribution of test 
cases. By programming a suitable generator, the tester can not only control the 
distribution of test cases, but also ensure that they satisfy arbitrarily complex 
invariants. 



2.2 Defining Properties 

As a first example, we are going to test the standard function reverse which 
reverses a list. This function satisfies a number of useful laws, such as: 

reverse [x] = [x] 

reverse (xs++ys) = reverse ys++reverse xs 
reverse (reverse xs) = xs 

(In fact, the first two of these characterise reverse uniquely.) 

Note that these laws hold only for finite, total values. In all QuickCheck prop- 
erties, unless specifically stated otherwise, we quantify over completely defined 
finite values. 

In order to check such laws using QuickCheck, we represent them as Haskell 
functions. To represent the second law for example, we write: 

prop_RevApp xs ys = reverse (xs++ys) == reverse ys ++ reverse xs 

We use the convention that property function names always start with the pre- 
fix prop_. Nevertheless, propJlevApp is still a normal Haskell function. If this 
function returns True for every possible argument, then the properties hold. 
However, in order for us to actually test this property, we need to know on what 
type to test it! We do not know this yet since the function propJlevApp has a 
polymorphic type. Thus the programmer must specify a fixed type at which the 
property is to be tested. So we simply give a type signature for each property, 
for example: 

prop_RevApp : : [Int] -> [Int] -> Bool 

prop_RevApp xs ys = reverse (xs++ys) == reverse ys ++ reverse xs 

Lastly, to access the library of functions that we can use to define and test 
properties, we have to include the QuickCheck module. Thus we add 



import QuickCheck2 
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sort : : Ord a => [a] -> [a] 
sort [] = [] 

sort (x:xs) = insert x (sort xs) 
insert : : Ord a => a -> [a] -> [a] 
insert x [] = [x] 

insert x (y:ys) = if x <= y then x : y : ys 
else y : insert x ys 



Fig. 1. An insertion-sort program. 



at the top of our module. QuickCheck2 is a special version of QuickCheck with 
a facility to interoperate with the tracing tools (explained in Section 4) . 

Now we are ready to test the above property! We load our module into a Haskell 
system (we use GHCi in these notes), and call for example: 

Main> quzckCheck prop_RevApp 
OK, passed 100 successful tests. 

The function quickCheck takes a property as an argument and applies it to a 
large number of randomly generated arguments — 100 by default — reporting 
“OK” if the result is True in every case. 

If the law fails, then quickCheck reports the counter-example. For example, if 
we mistakenly define 

prop_RevApp : : [Int] -> [Int] -> Bool 

prop_RevApp xs ys = reverse (xs++ys) == reverse xs++reverse ys 

then checking the property might produce 

Main> quickCheck prop_RevApp 
Falsifiable, after 1 successful tests: 

[ 2 ] 

[- 2 , 1 ] 

where the counter model can be reconstructed by taking [2] for xs (the first 
argument of the property), and [-2,1] for ys (the second argument). We will 
later see how we can use tracing to see what actually happens with the functions 
in a property when running it on a failing test case. 



2.3 Introducing Helper Functions 

Take a look at the insertion sort implementation in Figure 1. Let us design a 
test suite to test the functions in that implementation. 

First, we test the function sort. A cheap way of testing a new implementation 
of a sorting algorithm is to use an existing implementation which we trust. We 
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say that our function sort produces the same result as the standard function 
sort which comes from the List module in Haskell. 

import qualified List 

prop_SortIsSort ; : [Int] -> Bool 
prop_SortlsSort xs = sort xs == List. sort xs 

But what if we do not trust the implementation of the standard sort either? 
Then, we have to come up with properties that say when exactly a function is a 
sorting function. A function sorts if and only if: (1) the output is ordered, and 
(2) the output has exactly the same elements as the input. 

To specify the first property, we need to define a helper function ordered which 
checks that a given list is ordered. 

ordered : : Ord a => [a] -> Bool 

ordered [] = True 

ordered [x] = True 

ordered (x:y;xs) = (x <= y) && ordered (y:xs) 

Then, the orderedness property for sort is easy to define: 

prop_SortOrdered : : [Int] -> Bool 
prop_SortOrdered xs = ordered (sort xs) 

For the second property, we also need to define a helper function, namely one 
that checks if two lists have the same elements. 

ScuneElems : : Eq a => [a] -> [a] -> Bool 

[] 'sameElems' [] = True 

(x;xs) ‘sameElems' ys = (x ‘elem' ys) && 

(xs ‘sameElems' (ys \\ [x] ) ) 

'sameElems' _ = False 

The second sorting property is then rather easy to define too: 

prop_SortSameElems : : [Int] -> Bool 
prop_SortSameElems xs = sort xs 'sameElems' xs 



2.4 Conditional Properties and Quantification 

It is good to define and test properties for many functions involved in an im- 
plementation rather than just, say, the top-level functions. Applying such fine 
grained testing makes it more likely to find mistakes. 

So, let us think about the properties of the function insert, and assume we do 
not have another implementation of it which we trust. The two properties that 
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should hold for a correct insert function are: (1) if the argument list is ordered, 
so should the result list be, and (2) the elements in the result list should be the 
same as the elements in the argument list plus the first argument. 

We can specifying the second property in a similar way to the property for sort 
defined earlier: 

prop_InsertSajneElems : : Int -> [Int] -> Bool 
prop_InsertSameElems x xs = insert x xs ‘ ScuneElems ‘ (x:xs) 

However, if we try to express the first property, we immediately run into prob- 
lems. It is not just a simple equational property, but a conditional property. 
QuickCheck provides an implication combinator, written ==>, to represent such 
conditional laws. Using implication, the first property for the insertion function 
can be expressed as: 

prop_InsertOrdered : : Int -> [Int] -> Property 
prop_InsertOrdered x xs = ordered xs ==> ordered (insert x xs) 

Testing such a property works a little differently. Instead of checking the property 
for 100 random test cases, we try checking it for 100 test cases satisfying the 
condition. If a candidate test case does not satisfy the condition, it is discarded, 
and a new test case is tried. So, when a property with an implication successfully 
passes 100 test cases, we are sure that all of them actually satisfied the left hand 
side of the implication. 

Note that the result type of a conditional property is changed from Bool to 
Property. This is because the testing semantics is different for conditional laws. 

Checking prop_InsertOrdered succeeds as usual, but sometimes, checking a 
conditional property produces an output like this: 

Arguments exhausted after 64 tests. 

If the precondition of a law is seldom satisfied, then we might generate many 
test cases without finding any where it holds. In such cases it is hopeless to 
search for 100 cases in which the precondition holds. Rather than allow test case 
generation to run forever, we generate only a limited number of candidate test 
cases (the default is 1000). If we do not find 100 valid test cases among those 
candidates, then we simply report the number of successful tests we were able 
to perform. In the example, we know that the law passed the test in 64 cases. 
It is then up to the programmer to decide whether this is enough, or whether it 
should be tested more thoroughly. 



2.5 Monitoring Test Data 

Perhaps it seems that the implication operator has solved our problems, and 
that we are happy with the property prop_InsertOrdered. But have we really 
tested the law for insert thoroughly enough to establish its credibility? 
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Let us take a look at the distribution of test cases in the 100 tests that we per- 
formed on prop_InsertOrdered, by modifying prop_InsertOrdered as follows: 

prop_InsertOrdered : : Int -> [Int] -> Property 
prop_InsertOrdered x xs = ordered xs ==> 

classify (null xs) "trivial" $ 
ordered (insert x xs) 

Checking the law now produces the message 

OK, passed 100 successful tests (437, trivial) . 

The QuickCheck combinator classify does not change the logical meaning of 
a law, but it classifies some of the test cases. In this case those where xs is the 
empty list were classified as “trivial” . Thus we see that a large proportion of the 
test cases only tested insertion into an empty list. 

We can do more than just labelling some test cases with strings. The combinator 
collect gathers all values that are passed to it, and prints out a histogram of 
these values. For example, if we write: 

prop_lnsertOrdered : : Int -> [Int] -> Property 
prop_lnsertOrdered x xs = ordered xs ==> 

collect (length xs) $ 
ordered (insert x xs) 

we might get as a result: 

OK, passed 100 successful tests. 

407. 0. 

317. 1. 

197. 2. 

87. 3. 

27. 4. 

So we see that only 29 (=19-1-8-1-2) cases tested insertion into a list with more 
than one element. While this is enough to provide fairly strong evidence that 
the property holds, it is worrying that very short lists dominate the test cases 
so strongly. After all, it is easy to define an erroneous version of insert which 
nevertheless works for lists with at most one element. 

The reason for this behaviour, of course, is that the precondition ordered xs 
skews the distribution of test cases towards short lists. Every generated list of 
length 0 or 1 is ordered, but only 50% of the lists of length 2 are ordered, and 
not even 1% of all lists of length 5 are ordered. Thus test cases with longer 
lists are more likely to be rejected by the precondition. There is a risk of this 
kind of problem every time we use conditional laws, so it is always important to 
investigate the proportion of trivial cases among those actually tested. 
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It is comforting to be able to monitor the test data, and change the defiirition 
of our properties if we find the distributioir too biased. The best sohitioir in this 
case is to replace the condition with a custom test data generator for ordered 
lists. We write 

prop_InsertOrdered : : Int -> Property 
prop_InsertOrdered x = forAll orderedList $ \xs -> 

ordered (insert x xs) 

which specifies that values for xs should be generated by the test data gen- 
erator orderedList. This test data generator can make sure that the lists in 
question are ordered and have a more reasonable distribution. Checking the law 
now gives “OK, passed 100 successful tests”, as we would expect. Quick- 
Check provides support for the programmer to define his or her own test data 
generators, with coirtrol over the distribution of test data, which we will look at 
irext. 



2.6 Test Data Generation 

So far, we have irot said anything about how test data is geirerated. The way we 
generate rairdom test data of course depeirds on the type. Therefore, QuickCheck 
provides a type class Arbitrary, of which a type is an instance if we know how 
to geirerate arbitrary elemeirts iir it. 

class Arbitrary a where 
arbitrary : : Gen a 

Gen a is an abstract type representing a generator for type a. The programmer 
can either use the generators built in to QuickCheck as instances of this class, or 
supply a custom generator using the forAll combinator, which we saw in the 
previous section. 

Siirce we will treat Gen as air abstract type, we define a number of primitive 
functions to access its functionality. The first one is: 

choose :: (Int, Int) -> Gen Int 

This function chooses a random integer in an interval with a uniform distribution. 
We program other generators in terms of it. 

We also need combinators to build complex generators from simpler ones; to 
do so, we declare Gen to be an instance of Haskell’s class Monad. This involves 
implementing the methods of the Monad class: 

return : : a -> Gen a 

(>>=) : : Gen a -> (a -> Gen b) -> Gen b 
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The first method constructs a constant generator, i.e. return x always generates 
the same value x; the second method is the monadic sequencing operator, i.e. g 
>>= k first generates an a using g, and passes it to k to generate a b. 

Monads are heavily used in Haskell, and there are many useful overloaded stan- 
dard functions which work with any monad; there is even syntactic sugar for 
monadic sequencing (the do notation). By making generators into a monad, we 
are able to use all of these features to construct them. 

Defining generators for many types is now straightforward. As examples, we give 
generators for integers and pairs: 

instcuice Arbitrary Int where 
arbitrary = choose (-20, 20) 

instcuice (Arbitrary a, Arbitrary b) => Arbitrary (a,b) where 
arbitrary = 

do a <- arbitrary 
b <- arbitrary 
return (a,b) 

QuickCheck contains such declarations for most of Haskell’s predefined types. 

Looking at the instance of pairs above, we see a pattern that occurs frequently. In 
fact, Haskell provides a standard operator lif tM2 for this pattern. An alternative 
way of writing the instance for pairs is: 

instance (Arbitrary a, Arbitrary b) => 

Arbitrary (a,b) where 

arbitrary = liftM2 (,) arbitrary arbitrary 
We will use this programming style later on too. 

Since we define test data generation via an instance of class Arbitrary for each 
type, then we must rely on the user to provide instances for user-defined types. 

Instead of producing generators automatically, we provide combinators to enable 
a programmer to define his own generators easily. The simplest, called oneof , just 
makes a choice among a list of alternative generators with a uniform distribution. 

For example, a suitable generator for booleans could be defined by: 

instcuice Arbitrary Bool where 

arbitrary = oneof [return False, return True] 

As another example, we could generate arbitrary lists using 

instcUice Arbitrary a => Arbitrary [a] where 
arbitrary = oneof 

[return [] , liftM2 ( : ) arbitrary arbitrary] 
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where we use liftM2 to apply the cons operator ( : ) to an arbitrary head and 
tail. However, this definition is not really satisfactory, since it produces lists 
with an average length of one element. We can adjust the average length of list 
produced by using frequency instead, which allows us to specify the frequency 
with which each alternative is chosen. We define 

instance Arbitrary a => Arbitrary [a] where 
arbitrary = frequency 
[ (1, return [] ) 

, (4, liftM2 (:) arbitrary arbitrary) 

] 

to choose the cons case four times as often as the nil case, leading to an average 
list length of four elements. 



2.7 Generators with Size 

Suppose we have the following datatype of binary trees and the operations size 
and flatten: 

data Tree a = Leaf a 

I Branch (Tree a) (Tree a) 
deriving ( Show ) 

size : : Tree a -> Int 
size (Leaf a) =1 
size (Branch s t) = size s + size t 

flatten : : Tree a -> [a] 
flatten (Leaf a) = [a] 

flatten (Branch s t) = flatten s ++ flatten s 

An obvious property we would like to hold is that the length of a list which is a 
flattened tree should be the same as the size of the tree. Here it is: 

prop_SizeFlatten : : Tree Int -> Bool 
prop_SizeFlatten t = length (flatten t) == size t 

However, to test this property in QuickCheck, we need to define our own test 
data generator for trees. Here is our first try: 

instance Arbitrary a => Arbitrary (Tree a) where 

arbitrary = frequency — wrong! 

[ (1, liftM Leaf arbitrary) 

, (2, liftM2 Branch arbitrary arbitrary) 

] 
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We want to avoid choosing a Leaf too often (to avoid small trees), hence the 
nse of frequency. 

However, this definition only has a 50% chance of terminating! The reason is 
that for the generation of a Branch to terminate, two recnrsive generations must 
terminate. If the first few recursions choose Branches, then generation terminates 
only if very many recursive generations all terminate, and the chance of this is 
small. Even when generation terminates, the generated test data is sometimes 
very large. We want to avoid this: since we perform a large number of tests, we 
want each test to be small and quick. 

Our solution is to limit the size of generated test data. But the notion of a 
size is hard even to define in general for an arbitrary recursive datatype (which 
may include function types anywhere). We therefore give the responsibility for 
limiting sizes to the programmer defining the test data generator. We define a 
new combinator 

sized : : (Int -> Gen a) -> Gen a 

which the programmer can use to access the size bound: sized generates an a by 
passing the current size bound to its parameter. It is then up to the programmer 
to interpret the size bound in some reasonable way during test data generation. 
For example, we might generate binary trees using 

instcince Arbitrary a => Arbitrary (Tree a) where 
arbitrary = sized arbTree 
where 

arbTree n = frequency $ 

[ (1, liftM Leaf arbitrary) 

] ++ 

[ (4, liftM2 Branch arbTree2 arbTree2) 

I n > 0 

] 

where 

arbTree2 = arbTree (n ‘div‘ 2) 

With this definition, the size bound limits the number of nodes in the generated 
trees, which is quite reasonable. 

We can now test the property about size and flatten: 

Main> quickCheck prop_SizeFlatten 
Falsifiable, after 3 successful tests: 

Branch (Branch (Leaf 0) (Leaf 3)) (Leaf 1) 

The careful reader may have previously noticed the mistake in the definition of 
flatten which causes this test to fail. 

Now that we have introduced the notion of a size bound, we can use it sensibly 
in the generators for other types such as integers (with absolute value bounded 
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by the size) and lists (with length bounded by the size). So the definitions we 
presented earlier need to be modified accordingly. For example, to generate ar- 
bitrary integers, QuickCheck really uses the following default generator: 

instcLnce Arbitrary Int where 

arbitrary = sized (\n -> choose (-n, n) ) 

We stress that the size bound is simply an extra, global parameter which every 
test data generator may access; every use of sized sees the same bound. We do 
not attempt to ‘divide the size bound among the generators’, so that for example 
a longer generated list would have smaller elements, keeping the overall size of 
the structure the same. The reason is that we wish to avoid correlations between 
the sizes of different parts of the test data, which might distort the test results. 

We do vary the size between different test cases: we begin testing each property 
on small test cases, and then gradually increase the size bound as testing pro- 
gresses. This makes for a greater variety of test cases, which both makes testing 
more effective, and improves our chances of finding enough test cases satisfying 
the precondition of conditional properties. It also makes it more likely that we 
will find a small counter example to a property, if there is one. 



3 Tracing Programs with Hat 

In this section we give a basic introduction to the Hat tools^ for tracing Haskell 
programs. 



3.1 Traces and Tracing Tools 

Without tracing, programs are like black boxes. We see only their input-output 
behaviour. To understand this behaviour our only resort is the static text of 
a program, and it is often not enough. We should like to see the component 
functions at work, the arguments they are given and the results they return. We 
should like to see how their various applications came about in the first place. 
The purpose of tracing tools like Hat is to give us access to just this kind of 
information that is otherwise invisible. 

For more than 20 years researchers have been proposing ways to build tracers 
for lazy higher-order functional languages. Sadly, most of their work has never 
been widely used, because it has been done for locally used implementations 
of local dialect languages. A design-goal for Haskell was to solve the language- 
diversity problem. The problem will always persist to some degree, but Haskell 
is the nearest thing there is to a standard lazy functional language. Now the 
challenge is to build an effective tracer for it, depending as little as possible on 
the machinery of specific compilers or interpreters. 

^ Available from http://www.cs.york.ac.uk/fp/hat/. 
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Tracers for conventional languages enable the user to step through a computa- 
tion, stopping at selected points to examine variables. This approach is not so 
helpful for a lazy functional language where the order of evaluation is not the 
order of appearance in a source program, and in mid computation variables may 
be bound to complex-looking unevaluated expressions. Like some of its prede- 
cessors, Hat is instead based on derivation graphs for complete computations. 
This representation liberates us from the time-arrow of execution. For example, 
all arguments and results can be shown in the most fully evaluated form that 
they ever attain. The established name for this technique is strictification, but 
this name could be misleading: we do not force functions in the traced program 
into strict variants; all the lazy behaviour of the normally-executed program is 
preserved. 

When we compile a program for tracing it is automatically transformed by a pre- 
processor called hat-trans into a self-tracing variant. The transformed program 
is still in Haskell, not some private intermediate language, so that Hat can be 
ported between compilers. When we run the transformed program, in addition 
to the I/O behaviour of the original, it generates a graph-structured trace of 
evaluation called a redex trail. The trace is written to file as the computation 
proceeds. Trace files contain a lot of detail and they can be very large — tens or 
even hundreds of megabytes. So we should not be surprised if traced programs 
run much less quickly than untraced ones, and we shall need tools to select and 
present the key fragments of traces in source-level terms. 

There are several Hat tools for examining traces, but in these notes we shall 
look at the two used most: hat-trail and hat-observe. As a small illustrative 
application we take sorting the letters of the word ‘program’ using insertion sort. 
That is, to the definitions of Figure 1 we now add 

main = putStrLn (sort "program") 

to make a source program Insort. hs. At first we shall trace the working pro- 
gram; later we shall look at a variant Badinsort.hs with faults deliberately 
introduced. 



3.2 Hat Compilation and Execution 

To use Hat, we first compile the program to be traced, giving the -hat option 
to hmake: 

$ hmake -hat Insort 
hat-trans Insort. hs 
Wrote TInsort.hs 

ghc -package hat -c -o TInsort.o TInsort.hs 
ghc -package hat -o Insort TInsort.o 

A program compiled for tracing can be executed just as if it had been compiled 
normally. 
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$ Insort 
agmoprr 

The main difference from untraced execution is that as Insort runs it records 
a detailed trace of its computation in a file Insort. hat. The trace is a graph 
of program expressions encoded in a custom binary format. Two further files 
Insort . hat . output and Insort . hat . bridge record the output and associated 
references to the trace file. Trace files do not include program sources, but they 
do include references to program sources, so modifying source files may invalidate 
existing traces. 



3.3 Hat-trail: Basics 

After we have run a program compiled for tracing, creating a trace file, we can 
use Hat tools to examine the trace. The first such tool we shall look at is hat- 
trail. The idea of hat-trail is to answer the question ‘Where did that come from?’ 
in relation to the values, expressions, outputs and error messages that occur in 
a traced computation. The immediate answer will be a parent application or 
name. More specifically: 

— errors: the application or name being reduced when the error occurred (eg. 
head [] might be the parent of a pattern-match failure); 

“ outputs: the monadic action that caused the output (eg. putStr "Hello 
world" might the parent of a section of output text); 

— non-value expressions: the application or name whose defining body contains 
the expression of which the child is an instance (eg. insert 6 [3] might be 
the parent of insert 6 []); 

~ values: as for non-value expressions, or the application of a predefined func- 
tion with the child as result (eg. [1,2] ++[3,4] might be the parent of 
[ 1 , 2 , 3 , 4 ]). 

Parent expressions, and their subexpressions, may in turn have parents of their 
own. The tool is called hat-trail because it displays trails of ancestral redexes, 
tracing effects back to their causes. 

We can think of redex trails as a generalisation of the stack back-traces for 
conventional languages, showing the dynamically enclosing call-chain leading 
to a computational event. Because of lazy evaluation, the call-stack may not 
actually exist when the event occurs, but there is sufficient information in a 
Hat trace to reconstruct it. When we are tracing the origins of an application 
using hat-trail we have five choices: we can trace the ancestry not only of (1) the 
application itself, as in a stack back-trace, but also of (2) the function, or (3) 
an argument — or indeed, any subexpression of these. We can also ask to see 
a relevant extract of the source program: either (4) the expression of which the 
application is an instance, or (5) the definition of the function being applied. 
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Hat-trail sessions and requests We can start a hat-trail session from a shell 
command line, or from within existing sessions of hat tools. If we give the shell 
command 

$ hat-trail Insort 

a new window appears with an upper part headed Output and a lower part 
headed Trail: 

Output : — 

agmoprr\n 



Trail: hat-trail 2.00 (:h for help, :q to quit) 

The line of output is highlighted^ because it is the current selection. 

Requests in hat-trail are of two kinds. Some are single key presses with an 
immediate response; others are command- lines starting with a colon and only 
acted upon when completed by keying return. A basic repertoire of single-key 
requests is: 

return add to the trail the parent expression of the current selection 

backspace remove the last addition to the trail display 
arrow keys select (a) parts of the output generated by different actions, or 
(b) subexpressions of expressions already on display 

And a basic repertoire of command- line requests is: 

: source show the source expression of which the current selection is an 
instance 

: quit finish this hat-trail session 

It is enough to give initial letters, : s or :q, rather than : source or :quit. 



Some insertion- sort trails To trace the output from the Insert computation, 
we key return and the Trail part of the display becomes: 



Trail 
< 



putStrLn "agmoprr" 



Insort.hs line: 10 col: 8 



The source reference is to the corresponding application of putStrLn in the 
program. If we give the command :s at this point, a separate source window 
shows the relevant extract of the program. We can only do two things with a 
source window: (1) look at it; (2) close it. Tracing with Hat does not involve 
annotating or otherwise modifying program sources. 

Back to the Trail display. We key return again: 

® In these notes, highlighted text or expressions are shown boxed; the Hat tools actually 
use colour for highlighting. 
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Trail: 

<- putStrLn "agmoprr" 



<- main 



Insort. hs line: 10 col: 1 



That is, the line of output was produced by an application of putStrLn occurring 
in the body of main. 

So far, so good; but what about the sorting? How do we see where put Sir’s string 
argument "agmoprr" came from? By making that string the current selection 
and requesting its parent: 



backspace (removes main) , 
right-arrow (selects putStrLn), 
right-arrow (selects "agmoprr"), 
return (requests parent expression) 

Trail: Insort. hs line: 7 col: 19 



<- 


putStrLn 


"agmoprr" 




<- 


insert ’p’ "agmorr' 


1 if False 



The I symbol here is a separator between a function application and the trace 
of a conditional or case expression that was evaluated in its body; guards are 
shown in a similar way. The string "agmoprr" is the result of inserting 'p', the 
head of the string "program", into the recursively sorted tail. More specifically, 
the string was computed in the else-branch of the conditional by which insert 
is defined in the recursive case (because ’p’ <= ’a’ is False). 

And so we could continue. For example, following the trail of string arguments: 



<- 

<- 

<- 

<- 

<- 

<- 

<- 



insert ’r’ 



insert ’o' 



insert 'g' 
insert ’r’ 



insert 



agmor 



agmr 



'agmorr" I if False 
I if False 
I if False 
I if False 
I if False 
if True 



insert 'm' [] 



But let’s leave hat-trail for now. 



; quit 
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3.4 Hat-observe: Basics 

The idea of hat-observe is to answer the question ‘To which arguments, if any, 
was that applied, and with what results?’, mainly in relation to a top-level func- 
tion. Answers take the form of a list of equational observations, showing for each 
application of the function to distinct arguments what result was computed. In 
this way hat-observe can present all the needed parts of an extensional spec- 
ification for each function defined in a program. We also have the option to 
limit observations to particular patterns of arguments or results, or to particular 
application contexts. 

Hat-observe sessions and requests We can start a hat-observe session from a 
shell command line, or from within an existing session of a Hat tool. 

$ hat-observe Insort 

hat-observe 2.00 (:h for help, :q to quit) 

hat-observe> 

In comparison with hat-trail, there is more emphasis on command-lines in hat- 
observe, and the main user interface is a prompt-request-response cycle. Requests 
are of two kinds. Some are observation queries in the form of application patterns: 
the simplest observation query is just the name of a top-level function. Others 
are command- lines, starting with a colon, similar to those of hat-trail. A basic 
repertoire of command-line requests is 

:info list the names of functions and other defined values that can be 
observed, with application counts 
: quit finish this hat-observe session 

Again it is enough to give the initial letters, : i or :q. 

Some insertion-sort observations We often begin a hat-observe session with an 
: info request, followed by initial observation of central functions. 

hat-observe> ; info 

19 <= 21 insert 1 main 1 putStrLn 8 sort 

hat-observe> sort 

1 sort "progrcun" = "agmoprr" 

2 sort "rogram" = "agmorr" 

3 sort "ogram" = "agmor" 

4 sort "gram" = "agmr" 

5 sort "ram" = "amr" 

6 sort "am" = "am" 

7 sort "m" = "m" 

8 sort [] = [] 
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Here the number of observations is small. Larger collections of observations are 
presented in blocks of ten (by default). 



hat-observe> <= 



1 ^ 


’a’ 


<= 




= 


True 


2 


>r> 


<= 




= 


False 


3 ^ 


’g’ 


<= 


^ a^ 


= 


False 


4 ^ 


’ 0 ’ 


<= 


^ a^ 


= 


False 


5 ■ 


’P’ 


<= 


^ a^ 


= 


False 


6 ^ 


>r> 


<= 




= 


False 


7 ^ 


’g’ 


<= 




= 


True 


8 ^ 


’ 0 ’ 


<= 




= 


False 


9 ^ 


>r> 


<= 




= 


False 


10 


>p 


’ <= 


'S' 


> - 


= False 



— more — > 



If we key return in response to — more — >, the next block of observations ap- 
pears. Alternatively, we can make requests in the colon-command family. Any 
other line of input cuts short the list of reported observations in favour of a fresh 
hat-observe> prompt. 

— more — > n 
hat-observe> 



Observing restricted patterns of applications Viewing a block at a time is not the 
only way of handling what may be a large number of applications. We can also 
restrict observations to applications in which specific patterns of values occur as 
arguments or result, or to applications in a specific context. The full syntax for 
observation queries is 

identifier pattern* [= pattern] [in identifier] 

where the * indicates that there can be zero or more occurrences of an argument 
pattern and the [...] indicate that the result pattern and context are optional. 
Patterns in observation queries are simplified versions of constructor patterns 
with _ as the only variable. Some examples for the Insort computation: 

hat-observe> insert ’g’ 

1 insert 'g' "amr" = "agmr" 

2 insert 'g' "mr" = "gmr" 
hat-observe> insert _ _ = [_] 

1 insert ’m’ [] = "m" 

2 insert ’r’ [] = "r" 
hat-observe> sort in main 

1 sort "progrcun" = "agmoprr" 
hat-observe> sort in sort 
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sort : : Ord a => [a] -> [a] 

— FAULT (1) : missing equation for [] argument 

sort (x:xs) = insert x (sort xs) 

insert : : Ord a => a -> [a] -> [a] 

insert x [] = [x] 

insert x (y:ys) = if x <= y 

— FAULT (2) : y missing from result 
then X : ys 

— FAULT (3) : recursive call is same 
else y : insert x (y:ys) 
main = putStrLn (sort "program") 



Fig. 2. Badinsort .hs, a faulty version of the Insertion-sort program. 



1 sort "rogram" = "agmorr" 

2 sort "ogram" = "agmor" 

3 sort "gram" = "agmr" 

4 sort "ram" = "cunr" 

5 sort "am" = "cun" 

6 sort "m" = "m" 

7 sort [] = [] 

Enough on hat-observe for now. 
hat-observe> : quit 



3.5 Tracing Faulty Programs 

We have seen so far some of the ways in which Hat tools can be used to trace a 
correctly working program. But a common and intended use for Hat is to trace 
a faulty program with the aim of locating the source of the faults. A faulty 
computation has one of three outcomes: (1) termination with a run-time error, 
or (2) termination with incorrect output, or (3) non-termination. 

A variant of Insort given in Figure 2 contains three deliberate mistakes, each 
of which alone would cause a different kind of fault, as indicated by comments. 
In the following sections we shall apply the Hat tools to examine the faulty 
program, as if we didn’t know in advance where the mistakes were. 



Tracing a Run-time Error We compile the faulty program for tracing, then 
run it: 

$ hmake -hat Badinsort 

$ Badinsort 
No match in pattern. 
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Using hat-trail We can easily trace the immediate cause of the error message, 
which hat-trail displays as a starting point. We key return once to see the erro- 
neous application, then again to see its parent application: 

$ hat-trail Badinsort 

Error : 

No match in pattern. 

Trail: Badinsort. hs line: 3 col: 25 

<- 

<- 



This information can be supplemented by reference to the source program. With 
sort [] selected, we can give the : source command to see the site of the 
offending application in the recursive equation for sort. If necessary we could 
trace the ancestry of the [] argument or the sort application. 

Using hat-observe Although hat-trail is usually our first resort for tracing run- 
time errors, it is instructive to see what happens if instead we try using hat- 
observe. 

$ hat-observe Badinsort 

hat-observe 2.00 (:h for help, :q to quit) 
hat-observe> ; info 

7+0 insert 1 main 1 putStrLn 1+7 sort 

What do the M+N counts for insert and sort mean? M is the number of 
applications that never got beyond a pattern-matching stage involving evaluation 
of arguments; N is the number of applications that were actually reduced to an 
instance of the function body. Applications are only counted at all if their results 
were demanded during the computation. Where a count is shown as a single 
number, it is the count N of applications actually reduced, and M = 0. 

In the Badinsort computation, we see there are fewer observations of insert 
than there were in the correct Insort computation, and no observations at all 
of <=. How can that be? What is happening to ordered insertion? 

hat-observe> insert 

1 insert ’p’ _|_ = _|_ 

2 insert ’r’ _|_ = _|_ 

3 insert ’o’ _|_ = _|_ 

4 insert ’g’ _|_ = _|_ 

5 insert ’a’ _|_ = _|_ 

6 insert ’m’ _|_ = _|_ 



sort [] 
sort "m" 
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The symbol _ I _ here is an ASCII approximation to T and indicates an unde- 
fined value. Reading the character arguments vertically "program" seems to be 
misspelt: is there an observation missing between 4 and 5? There are in fact 
two separate applications insert ’r’ _|_ = _ I but duplicate observations 
are not listed (by default). 

The insert observations explain the fall in application counts. In all the ob- 
served applications, the list arguments are undefined. So neither of the defining 
equations for insert is ever matched, there are no <= comparisons (as these oc- 
cur only in the right-hand side of the second equation) and of course no recursive 
calls. 

Why are the insert arguments undefined? They should be the results of sort 
applications. 

hat-observe> sort 

1 sort "progrcun" = _l_ 

2 sort "rogram" = _l_ 

3 sort "ogram" = _l_ 

4 sort "gram" = _l_ 

5 sort "ram" = _l_ 

6 sort "am" = _l_ 

7 sort "m" = _L 

8 sort [] = _ I _ 

Though all the sort results are _ I _, the reason is not the same in every case. 
Observations 1 to 7 show applications of sort that reduced to applications of 
insert, and as we have already observed, every insert result is _l_^. Obser- 
vation 8 is an application that does not reduce at all; it also points us to the 
error. 



Tracing a Non-terminating Computation Suppose we correct the first 
fault, by restoring the equation: 

sort [] = [] 

Now the result of running Badinsort is a non-terminating computation, with 
an infinite string aaaaaaa. . . as output. It seems that Badinsort has entered 
an infinite loop. The computation can be interrupted® by keying control-C. 

$ Badinsort 

Program interrupted. ("C) 

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa$ 



This insight requires knowledge of the program beyond the listed applications in hat- 
observe: for example, it could be obtained by a linked use of hat-trail (see Section 3.6) 
® When non-termination is suspected, interrupt as quickly as possible to avoid working 
with very large traces. 
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Using hat-trail The initial hat-trail display is: 

Error : 

Program interrupted. (~C) 

Output : 

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. . . 



We have a choice. We can follow the trail back either from the point of interrup- 
tion (the initial selection) or from the output (reached by down-arrow). In this 
case, it makes little difference®; either way we end up examining the endless list 
of ’a’s. Let’s select the output: 

Output : 

I aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . . . | 

Trail: Badinsort.hs line: 7 col: 19 

<- 

<- 



Notice two further features of expression display: 

— the ellipsis ... in the string argument to putStrLn indicates the tail-end of 
a long string that has been pruned from the display; 

— the symbol _ in the list argument to insert indicates an expression that 
was never evaluated. 

The parent application insert ’p’ (’a’ :_) I if False gives us several im- 
portant clues. It tells us that in the else-branch of the recursive case in the 
definition of insert the argument’s head (here ’a’) is duplicated endlessly to 
generate the result without ever demanding the argument’s tail. This tells us 
enough to discover the fault if we didn’t already know it. 



putStrLn aaaaaaaa. . . 



( ’ a’ : _) I if False 



Using hat-observe Once again, let us also see what happens if we use hat-observe. 
hat-observe> ; info 

78 <= 1+83 insert 1 main 1 putStrLn 8 sort 



The high counts for <= and insert give us a strong clue: as <= is primitively 
defined, we immediately suspect insert. 

hat-observe> insert 

1 insert 'p' (’a’:_) = "aaaaaaaaaa. . . " 



However, the trace from point of interruption depends on the timing of the interrupt. 
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2 


insert 




(’a’ 


:_) = 


^ a 


3 


insert 


^ 0 ^ 


(’a’ 


:_) = 


^ a 


4 


insert 




(’a’ 


:_) = 


^ a 


5 


insert 




"m" 


= "a" 




6 


insert 




[] = 


"m" 





searching . . . (~C to interrupt) 

{interrupted} 

Many more observations would eventually be reported because hat-observe lists 
each observation that is distinct from those listed previously. When the compu- 
tation is interrupted there are many different applications of the form insert 
’p’ (’a’ :_) in progress, each with results evaluated to a different extent. 

But observation 1 is enough. As the tail of the argument is unevaluated, the 
result would be the same whatever the tail. For example, it could be [] ; so 
we know insert ’p’ "a" = "aaaa. . This specific and simple failing case 
directs us to the fault in the definition of insert. 



Tracing Wrong Output We correct the recursive call from insert x (y:ys) 
to insert x ys, recompile, then execute. 

$ Badinsort 
agop 



Using hat-observe Once again, we could reach first for hat-trail to trace the fault, 
but the availability of a well-defined (but wrong) result also suggests a possible 
starting point in hat-observe: 

hat-observe> insert _ _ = "agop" 

1 insert 'p' "agor" = "agop" 

Somehow, insertion loses the final element ’ r ’ . We should like to see more details 
of how this result is obtained — the relevant recursive calls, for example: 

hat-observe> insert ’p’ _ in insert 

1 insert ’p’ "gor" = "gop" 

2 insert ’p’ "or" = "op" 

3 insert ’p’ "r" = "p" 

Observation 3 makes it easy to discover the fault by inspection. 

Using hat-trail If we instead use hat-trail, the same application could be reached 
as follows. We first request the parent of the output; unsurprisingly it is putStrLn 
"agop". We then request the parent of the string argument "agop": 
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Output : 
agop\n 



Trail: Badinsort.hs line: 10 col: 26 



<- 


putStrLn 


"agop" 






<- 


insert 'p' "agor" 


if False 



As in hat-observe, we see the insert application that loses the character ’r’. 



3.6 Linked use of hat-observe and hat-trail 



Although we have so far made use of hat-observe and hat-trail separately, each 
can be applied within the other using the following commands: 

: o in hat-trail, with an application of / as the current selection, starts 
a new hat-observe window listing all the traced applications of / 

: t TV in hat-observe, following the display of a list of at least TV appli- 
cations, starts a new hat-trail window with the TVth application 
as the initial expression 

Returning to the last example in the previous section, suppose we begin the 
investigation in hat-trail 



Trail: Badinsort.hs line: 10 col: 26 



<- 


putStrLn 


"agop" 






<- 


insert ’p’ "agor" 


if False 



and see that insert is broken. Wondering if there is an even simpler failure 
among the traced applications of insert we use : o to list them all in hat- 
observe. The list begins: 



1 


insert 




"agor" = "agop" 


2 


insert 


,r> 


"ago" = "agor" 


3 


insert 


’ 0 ’ 


"ag" = "ago" 


4 


insert 


’g’ 


ar - ag 


5 


insert 


’r’ 


"a" = "ar" 


6 


insert 


’a’ 


"m" = "a" 


7 


insert 




[] = "m" 


8 


insert 


'r' 


[] = "r" 


9 


insert 


’g’ 


It^ll _ "g" 



10 insert ’o’ "g" = "go" 
— more — > 
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Observation 6 (or 9) is the simplest so we use :t 6 to request a new session of 
hat-trail at this simpler starting point: 



Trail : 
< 



insert 



)„) ri„ri 



Badinsort.hs line: 10 col: 26 



We sometimes find it useful to start additional windows for sessions of the same 
Hat tool but looking at different parts of the trace: 

: o [P] in hat-observe, where P is an application pattern, starts a new 

hat-observe window with P (if given) as the first observation 
request 

: t in hat-trail, starts a new hat-trail window with the current 

selection as the initial expression 

Apart from the determination of its starting point, a hat-observe or hat-trail 
session created by a : o or : t command is quite independent of the session from 
which it was spawned. 

Finally, some facilities so far shown only in one tool are available in the other in 
a slightly different form. Two frequently used examples are: 

:sN in hat-observe, following the display of a list of at least N 

applications, creates a source window showing the expression 
of which application N is an instance 

= in hat-trail, if the outermost expression enclosing the current 

selection is a redex, complete the equation with that redex 
as left-hand side, adding = and a result expression (which be- 
comes the new selection); or if the current selection is within 
an already completed equation, revert to the display of the 
left-hand-side redex only (which becomes the new selection) 

There is a particular reason for the = command in hat-trail. Following trails of 
ancestral redexes means working backwards, from results expressed in the body of 
a function to the applications and arguments that caused them. The movement 
is outward, from the details inside a function to an application context outside 
it. Using = is one way to go forwards when the key information is what happens 
within an application, not how the application came about. Returning once more 
to our running example, here is how = can be used to reach inside the insert 
’a’ "m" computation. 



Trail: 

<- insert 'a' "m" = 



Badinsort.hs line: 8 col: 28 



<- insert 



’a’ "m" 



if 



True 



’a’ <= 
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4 Combined Testing and Tracing 

When testing identifies a test that fails, we may need to trace the failing com- 
putation to understand and correct the fault. But it is usually too expensive to 
trace all tests just in case one of them fails. In this section, we describe a way 
of working with the tools that addresses this requirement. 

We have defined a variant of the top-level testing function quickCheck, called 
traceCheck. It has two modes of working: 

~ In running mode, traceCheck seems to behave just like quickCheck, but 
actually keeps track of what test cases succeeded and what test case failed. 
In does this in a special file in the working directory called .tracecheck. 

~ In tracing mode, traceCheck reads this file, and will repeat the exact test 
case that led to failure, and only that one. 

Suppose, for example, that we wish to test an insert function defined (incor- 
rectly) in the file Insert. hs: 

module Insert where 

insert : : Qrd a => a -> [a] -> [a] 
insert x [] = [x] 

insert x (y:ys) = if x <= y then x : ys 
else y : insert x ys 

We could write a test program (iprop.hs) like this: 

import Insert 
import QuickCheck2 

ScuneElems : : Eq a => [a] -> [a] -> Bool 

[] 'sameElems' [] = True 

(x:xs) ‘sameElems' ys = (x 'elem' ys) && 

(xs ‘sameElems' (ys \\ [x] ) ) 

'sameElems' _ = False 

prop_InsertSameElems : : Int -> [Int] -> Bool 
prop_InsertSameElems x xs = insert x xs ‘scuneElems' (x:xs) 

We can load this module into GHCi as usual, and run quickCheck on the prop- 
erty: 

Main> quickCheck prop_InsertSameElems 
Falsifiable, after 1 successful tests: 

1 

[0,1,0] 
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But when we want to trace what is going on here we will have to use traceCheck. 
Notice that traceCheck is a bit slower than quickCheck since it has to save its 
random seed to a file each time before it evaluates a test — the test might crash 
or loop, and it might never come back. 

Main> traceCheck prop_InsertSameElems 
Falsifiable, after 0 successful tests: 

0 

[- 1 , 1 , 0 ] 

(Seed saved — trace the programi to reproduce.) 

We now add the following definition of main to our property module Iprop . hs 
so that we have a complete Haskell program which can be compiled for tracing. 

main : : 10 () 

main = traceCheck prop_InsertSameElems 

Leaving this definition of main in the file Iprop. hs does not hurt, and reduces 
work when tracing the property again later. Let us compile and trace the pro- 
gram: 

$ hmake -hat -package quickcheck2 Iprop 
$ ./Iprop 

Again, the property is falsifiable: 

0 

[- 1 , 1 , 0 ] 



The output from traceCheck is now a little different. It only carries out the 
failing test that was saved by the previous traceCheck application, and confirms 
the result. 

A .hat file is now generated, and the tracing tools can be applied as usual. It is 
usually best to start with hat-observe when tracing a failed property, and observe 
the function calls of the functions mentioned in the property. The hat-trail tool 
can then be called upon from hat-observe. 

For our illustrative example hat-observe immediately reveals that the recursive 
call is an even simpler failed application: 

hat-observe> insert 

1 insert 0 [-1,1,0] = [-1,0,0] 

2 insert 0 [1,0] = [0,0] 



If necessary, :t 2 allows us to examine details of the derivation using hat-trail. 
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Trail: 

<- insert 0 [1,0] 

<- insert 0 [1,0] 

<- 0 <= 1 




Insert. hs line: 



5 col: 24 



And so on. 



5 Working with a Larger Program 

So far our example programs have been miniatures. In this section we introduce a 
rather larger program — a multi-module interpreter and compiler — and discuss 
how to handle some of the problems it poses for testing and tracing. 



5.1 An Implementation of Imp 

We begin by outlining an interpreter and compiler for Imp, a simple imperative 
language. Imp programs are command sequences with the following grammar: 

comseq = com {; com} 

com = skip 

I print exp 

I if exp then comseq else comseq fi 
I while exp do comseq od 
I name exp 

exp = term {op2 term} 

term = name | value | opl term | (exp) 

Names are lower-case identifiers; values are integers or booleans; opl and op2 
are the usual assortments of unary and binary operators. Here, for example, is 
an Imp program (gcd. in) to compute the GCD of 148 and 58: 

x := 148; y := 58; 
while ~(x=y) do 

if X < y then y := y - x 
else X := X - y 
fi 
od; 

print X 

The operational behaviour of an Imp program can be represented by a value of 
type Trace String, with Trace defined as follows. 
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data Trace a = a ;> Trace a I End I Crash I Step (Trace a) 

Each printing of a value is represented by a :>. A program may terminate nor- 
mally (End), terminate with an error (Crash) or fail to terminate by looping 
infinitely. For example, the following trace is generated by a program that prints 
out 1, 2 and then crashes: 

1 : > 2 : > Crash 

In order to deal with non-termination, we have introduced the Step constructor, 
as we explain later (in Section 5.3). 

Our implementation of Imp includes both direct interpretation of syntax trees 
and a compiler for a stack machine. Here is a module-level overview. 

Behaviour Defines behavioural traces, their concatenation and ap- 
proximate equality based on bounded prefixes. 

Compiler Generates machine instructions from the abstract syntax 
of a program. 

(Depends on Machine, Syntax, StackMap, Value.) 

Interpreter Computes the behavioural trace of a program by directly 
interpreting abstract syntax. 

(Depends on Syntax, Behaviour, Value) 

Defines stack machine instructions and the execution rules 
mapping instruction sequences to behavioural traces. 

(Depends on Behaviour, Value.) 

Reads an Imp program and reports the behavioural traces 
obtained when it is interpreted and when it is compiled. 

(Depends on Syntax, Parser, Interpreter, Machine, 
Compiler.) 

Defines parser combinators and an Imp parser using them. 
(Depends on Syntax, Value.) 

Models the run-time stack during compilation. 

Defines an abstract syntax for the language. 

(Depends on Value.) 

Defines basic values and primitive operations over them. 



Machine 

Main 

Parser 

StackMap 

Syntax 

Value 



5.2 Tracing bigger computations 

Even compiling an Imp program as simple as gcd . in, the binary-coded Hat trace 
exceeds half a megabyte. If we were tracing a fully-fledged compiler processing 
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a more typical program, the .hat file could be a thousaird tiiues larger. The 
developiuent of Hat was motivated by a lack of tracing iirformation for Haskell 
programs, but clearly we could have too much of a good thiirg! How do we cut 
down the amount of informatioir presented wheir tracing larger programs? (1) 
At compile-time we ideirtify some modules as trusted — details of computatioir 
within these modules are not recorded in traces. (2) At run-time we use simple 
inputs. It is helpful that QuickCheck test-case generators usually start with the 
simplest values. (3) At trace-viewing time we set options in the Hat tools to 
control how much information is shown and to what level of detail. 



Working with trusted modules Usually, untrusted modules depend on trusted 
ones, rather thair the other way rouird, so trusted modules need to be compiled 
first^. It is usually simplest first to compile all modules as trusted, then to 
recompile selected modules for full tracing. For example, if we wairt to compile 
the Imp system to trace only the details of computatioir in module Compiler: 

$ hmake -hat -trusted Main 

Compiles everything, with all modules trusted. 

$ touch Compiler. hs 
$ hmake -hat Main 

Recompiles Compiler, and Main which depends on it, as fully traced modules. 

How effectively does this reduce the amount of trace information? With no mod- 
ules trusted (apart from the prelude and libraries), and gcd.in as input, the 
: info table in hat-observe lists 88 top-level functions; more than a dozen have 
over 100 traced applications and several have over 300. With all but Main and 
Compiler trusted the : info table has just 23 entries; all but four of these show 
fewer than 10 applications and all have less than 30. 

When a module T is compiled as trusted, applications of exported T functions 
in untrusted modules are still recorded, but the details of the corresponding 
computation within T are not. For example, in the StackMap module there is a 
function to compute the iiritial map of the stack when execution begins: 

StackMap : : Command -> StackMap 
StackMap c = (0, comVars c) 

Details of the Command type (the abstract syntax of Imp programs) and the 
significance of the StackMap values need not concern us here; the point is that 
even with StackMap trusted, hat-observe reports the applicatioir of StackMap: 

1 StackMap 

("x" := Val (Num 148) :-> 

("y" := Val (Num 58) :-> 

(While (Uno Not (Duo Eq (Var "x") (Var "y"))) 

^ The Haskell prelude and standard libraries are pre-compiled and trusted by default. 
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(If (Duo Less (Var "x") (Var "y")) 

("y" := Duo Sub (Var "y") (Var "x")) 

("x" := Duo Sub (Var "x") (Var "y"))) :-> 
Print (Var "x")))) 



(0, ["x","y"]) 



But hat-observe does not report the application of the auxiliary function comVars 
that computes the second component ["x" , "y"] . This component is not just left 
orphaned — with no trace of a parent — instead it is adopted by the stackMap 
application, as this is the nearest ancestral redex recorded in the trace. In hat- 
trail, if we select the [ " x " , " y " ] component of the result and request the parent 
redex it is the stackMap application that is displayed. 

Some applications within a trusted module are still recorded. For example, there 
may be applications of untrusted functional arguments in trusted higher-order 
functions, and there may be applications of constructors recorded because they 
are part of a result. 



Controlling the volume of displayed information Even when traces are confined 
to specific functions of interest, there may be many applications of these func- 
tions, and the expressions for their arguments and results may be large and 
complex. In hat-trail, the number of applications need not concern us: only ex- 
plicitly selected derivations are explored, and each request causes only a single 
expression to be displayed. In hat-observe, the counts in :info tables warn us 
where there are large numbers of applications, by default only unique repre- 
sentatives are shown when a function is applied more than once to the same 
arguments, and patterns can be used to narrow the range of a search. But if the 
volume of output from hat-observe is still too high, we have two options: 

: set recursive off Recursive applications (ie. applications of / in the 
body of / itself) are not shown. 

: set group N Show only N observations at a time — the default 

is 10. 

In both hat-trail and hat-observe, large expressions can be a problem. Within a 
single window, the remedy® is to control the level of detail to which expressions 
are displayed. The main way we can do so is: 

: set cutoff N Show expression structure only to depth N — the 

default is 10. 

Rectangular placeholders (shown here as I) are displayed in place of pruned 
expressions, followed by ellipses in the case of truncated lists. For example, here 
once again is the application of stackMap to the abstract syntax for gcd. in, but 
lightly pruned (: set cutoff 8): 



Apart from making the window larger! After which a : resize command may be 
needed. 
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stackMap 

("x" := Val (Num 148) :-> 

("y" ;= Val (Num 58) :-> 

(While (Uno Not (Duo Eq (Var I) (Var ■) ) ) 

(If (Duo Less (Var I) (Var ■) ) ("y" := Duo III) 
("x" := Duo III)) ;-> Print (Var "x")))) 



More severely pruned (:set cutoff 4) it becomes a one-liner: 
StackMap ("x" := Val I :-> (I := ■ :-> (I :->■))) 



One limitation of depth-based pruning is its uniformity. We face a dilemma if 
two parts of an outsize expression are at the same depth, the details of one 
are irrelevant but the details of the other are vital. In hat-trail we can explicitly 
over-ride pruning for any selected I expression by keying +, and we can explicitly 
prune any other selected expression by keying -. A more extravagant solution is 
to view the expression in a cascade of hat-trail windows. Returning once more to 
the StackMap example, in a first hat-trail window suppose we have the heavily 
pruned redex, with a subexpression of interest selected: 



StackMap ("x" 



:= Val ■ :-> (I := ■ :-> 



(■:->■))) 



We give the command :t to spawn a fresh hat-trail session starting with this 
subexpression. Pruned to the same cutoff depth it is now revealed to be: 

While (Uno Not (Duo III)) (If (Duo III) (■:=■) (■:=■)) :-> 
Print (Var "x") 

Within this subexpression, we can select still deeper subexpressions recursively. 
We can continue (or close) the hat-trail session for each level of detail quite 
independently of the others. 



5.3 Specifying properties of Imp 

Let us think about how we are going to test the compiler and interpreter. There 
might be many properties we would like to test for, but one important property 
is the following: 

[Congruence Property] For any program p, interpreting p should pro- 
duce the same result as first compiling and then executing p. 

To formulate this as a QuickCheck property, the first thing we need to do is to 
define test data generators for all the types that are involved. We will show how 
to define the test data generators for the types Name and Expr. The other types 
have similar generators — see Compiler/Properties .hs for details. 

For the type Name, we will have to do something more than merely generating an 
arbitrary string. We want it to be rather likely that two independently generated 
names are the same, since programs where each occurrence of a variable is dif- 
ferent make very boring test data. One approach is to pick the name arbitrarily 
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from a limited set of names (say {”a”, . . . ,”z”}). It turns out that it is a good 
idea to make this set small when generating small test cases, and larger when 
generating large test cases. 

arbName : : Gen String 
arbNcime = sized gen 
where 

gen n = elements [ [c] I c <- take (n 'div‘ 2+1) [’a’..’z’] ] 

To generate elements of type Expr (the datatype representing Imp expressions), 
we assume that we know how to generate arbitrary Vais (representing Imp val- 
ues), Op Is and Qp2s (representing unary and binary operators, respectively). The 
Expr generator is very similar to the one for binary trees in Section 2.7. We keep 
track of the size bound explicitly when we generate the tree recursively. When 
the size is not strictly positive any more, we generate a leaf of the tree. 

instcLnce Arbitrary Expr where 
arbitrary = sized arbExpr 
where 

arbExpr n = 
frequency $ 

[ (1, liftM Var arbNcime) 

, (1, liftM Val arbitrary) 

] ++ 
concat 

[ [ (2, liftM2 Uno arbitrary arbExpr’) 

, (4, liftMS Duo arbitrary arbExpr2 arbExpr2) 

] 

I n > 0 

] 

where 

arbExpr’ = arbExpr (n-1) 
arbExpr2 = arbExpr (n ‘div‘ 2) 

There is no right or wrong way to choose frequencies for the constructors. A 
common approach is to think about the kinds of expressions that are likely 
to arise in practice, or that seem most likely to be counter-examples to our 
properties. The rationale for the above frequencies is the following: We do not 
want to generate leaves too often, since this means that the expressions are 
small. We do not want to generate a unary operator too often, since nesting Not 
or Minus a lot does not generate really interesting test cases. Also, the above 
frequencies can easily be adapted after monitoring test data in actual runs of 
QuickCheck on properties. 

Finally, we can direct our attention towards specifying the congruence property. 
Without thinking much, we can come up with the following property, which 
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pretty much directly describes what we mean by congruence; for all p, obey p 
should be equal to exec (compile p). 

prop_Congruence ; ; Command -> Bool 

prop_Congruence p = obey p == exec (compile p) — wrong! 

However, what happens when the program p is a non-terminating program? In 
the case where obey works correctly, the trace will either be an infinite trace 
of printed values, or the computation of the trace will simply not terminate. In 
both cases, the comparison of the two traces will not terminate either! So, for 
non-terminating programs, the above property does not terminate. 

We have run into a limitation of using an embedded language for properties, and 
testing the properties by running them like any other function. Whenever one 
of the functions in a property does not terminate, the whole property does not 
terminate. Similarly, when one of the functions in a property crashes, the whole 
property crashes. To avoid solving the Halting Problem, we take the pragmatic 
viewpoint that properties are allowed to crash or not terminate, but only in cases 
where they are not valid. 

The solution to the infinite trace problem consists of two phases. 

First, we have to make the passing of time during the execution of a program 
explicit in its trace. We do this so that any non-terminating program will gen- 
erate an infinite trace, instead of a trace that is stuck somewhere. The Step 
constructor is added to the Trace datatype for that reason — the idea is to 
let a trace make a ‘step’ whenever the body of a while-loop in the program has 
completed, so that executing the body of a while loop infinitely often produces 
infinitely many Steps in the trace. 

The second change we make is that when we compare these possibly infinite 
traces for equality, we only do so approximately, by comparing a finite prefix of 
each trace. The function approx n compares the first n events in its argument 
traces for equality®: 



approx 




: Eq a => 


Int -> Trace a -> 


Trace a -> Bool 


approx 


0 


_ 


_ 


= True 




approx 


n 


(a :> s) 


(b :> t) 


= a == b 


&& approx (n-1) s t 


approx 


n 


(Step s) 


(Step t) 


= approx 


(n-1) s t 


approx 


n 


End 


End 


= True 




approx 


n 


Crash 


Crash 


= True 




approx 


n 


_ 


_ 


= False 





Now we can define a trace comparison operator on the property level, which 
compares two traces approximately: For arbitrary strictly positive n, the traces 
should approximate each other up to n steps. (We choose strictly positive n since 

® A looser definition of approx would not require each occurrence of Step to match up, 
allowing more freedom in the compiler, but the current definition will do for now. 
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for n = 0 the approximation is trivially true which makes an uninteresting test 
case.) 

(=~=) : : Eq a => Trace a -> Trace a -> Property 
s =~= t = forAll arbitrary $ \n -> 
n > 0 ==> 

approx n s t 

The new version of the congruence property thus becomes: 

prop_Congruence : : Command -> Property 
prop_Congruence p = obey p =~= exec (compile p) 

Note that this is still not the final version of the property; there are some issues 
related to test coverage, which will be discussed in the exercises in Section 7.3. 



6 Related Work 

There are two other automated testing tools for Haskell. HUnit is a unit testing 
framework based on the JUnit framework for Java, which permits test cases to 
be structured hierarchically into tests which can be run automatically [9]. HUnit 
allows the programmer to define “assertions” — boolean-valued expressions — 
but these apply only to a particular test case, and so do not make up a specifica- 
tion. There is no automatic generation of test cases. However, because running 
QuickCheck produces a boolean result, any property test in QuickCheck could 
be used as a HUnit test case. 

Auburn [10, 11] is a tool primarily intended for benchmarking alternative imple- 
mentations of abstract data types. Auburn generates random “datatype usage 
graphs” (dugs), representing specific patterns of use of an ADT, and records the 
cost of evaluating them under each implementation. Based on these benchmark 
tests. Auburn can use inductive classification to obtain a decision tree for the 
choice of implementation, depending on application characteristics. It may also 
reveal errors in an ADT implementation, when dugs evaluated under different 
implementations produce different results, or when an operation leads to run- 
time failure. Auburn can produce dug generators and evaluators automatically, 
given the signature of the ADT. Dug generators are parameterised by a vector 
of attributes, including the relative frequency of the different operations and 
the degree of persistence. Auburn avoids generating ill-formed dugs by track- 
ing an abstract state, or “shadow”, for each value of the ADT, and checking 
preconditions expressed in terms of it before applying an operator. 

The more general testing literature is voluminous. Random testing dates from the 
1960s, and is now used commercially, especially when the distribution of random 
data can be chosen to match that of the real data. It compares surprisingly 
favourably in practice with systematic choice of test cases. In 1984, Duran and 
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Ntafos compared the fault detection probability of random testing with partition 
testing, and discovered that the differences in effectiveness were small [5] . Hamlet 
and Taylor corroborated the original results [8]. Although partition testing is 
slightly more effective at exposing faults, to quote Hamlet’s excellent survey [7], 
“By taking 20% more points in a random test, any advantage a partition test 
might have had is wiped out. ” QuickCheck’s philosophy is to apply random testing 
at a fine grain, by specifying properties of most functions under test. So even 
when QuickCheck is used to test a large program, we always test a small part at 
a time, and are therefore likely to exercise each part of the code thoroughly. 

Many other automatic testing tools require preprocessing or analysis of spec- 
ifications before they can be used for testing. QuickCheck is unique in using 
specifications directly, both for test case generation and as a test oracle. The 
other side of the coin is that the QuickCheck specification language is necessarily 
more restrictive than, for example, predicate calculus, since properties must be 
directly testable. 

QuickCheck’s main limitation as a testing tool is that it provides no information 
on the structural coverage of the program under test: there is no check, for exam- 
ple, that every part of the code is exercised. We leave this as the responsibility 
of an external coverage tool. Unfortunately, no such tool exists for Haskell! It is 
possible that Hat could be extended to play this role. 

Turning now to tracing, the nearest relative to Hat — indeed, the starting point 
for its design — is the original redex-trail system [14,13]. Whereas Hat uses a 
source-to-source transformation and a portable run-time library, the original sys- 
tem was developed by modifying a specific compiler and run-time system. Pro- 
grams compiled for tracing built trail-graphs within the limits of heap memory. 
Large computations often exceeded these limits, even if most parts of a program 
were trusted; to obtain at least partial trails in such cases, when trail-space 
was exhausted the garbage collector applied pruning rules based on trail-length. 
Users had a single viewing tool by which to access the in-heap trail; this tool 
supported backward tracing along the lines of hat-trail, but with a more elabo- 
rate graphical interface. The stand-alone trace files of Hat greatly increase the 
size of feasible traces, and give more permanent and convenient access to traces. 

Another system that had an important influence on the design of the Hat tools is 
HOOD [6]. HOOD (for Haskell Observation-Oriented Debugger) defines a class 
of observable types, for which an observe function is defined. Programmers 
annotate expressions whose values they wish to observe by applying observe 
label to them, where label is a descriptive string. These applicative annotations 
act as identities with a benign side-effect: each value to which an annotated 
expression reduces — so far as it is demanded by lazy evaluation — is recorded to 
file, listed under the appropriate label. As an added bonus, the recording of each 
value in the trace can be “played back” in a way that reveals the order in which its 
components were demanded. Among HOOD’s attractions, it is simply imported 
like any other library module, and programmers observe just the expressions 
that they annotate. Among its drawbacks, expressions do have to be selected 
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somehow, and explicitly annotated, and there is no record of any derivation 
between expressions, only a collection of final values. 

Then there is Freja [12], a compiler for a large subset of Haskell. Code generated 
by Freja optionally builds at run-time an evaluation dependence tree (EDT) in 
support of algorithmic debugging. In some ways Freja is similar to the redex 
trails prototype: a compiler is specially modified, a trace structure recording 
dependencies is built in the heap, and the programmer’s use of the trace is 
mediated by a single special-purpose tool. Tracing overheads in Freja are kept 
to a minimum by supporting trace-building operations at a low level in a native 
code generator, and by constructing only an initial piece of the trace at the EDT 
root — if a new piece is needed, the program is run again. But the most important 
distinctive feature of Freja is that its algorithmic debugger supports a systematic 
search for a fault. Each node in the EDT corresponds to an equation between 
an application and its result. Shown such an equation, the user gives a yes/no 
response depending on whether the equation correctly reflects their intended 
specification for the function. Only subtrees rooted by an incorrect equation are 
examined; eventually, an incorrect parent equation with only correct children 
indicates an error in the definition of the parent function. Applied to small 
exercises, algorithmic debugging is a superb tool. But for big computations the 
top-down exploration regime demands too much: even if the user is able to judge 
accurately the correctness of many large equations, the route taken to a fault 
may be far longer than, for example, the backward trail from a run-time error. 
Freja can be applied directly to EDT subtrees for specified functions, but this 
only helps if the user knows by some other means which functions to suspect. 

For tracing programs in a language like Haskell, the program-point observations 
of HOOD and the top-down exploration of declarative proof-trees as in Freja are 
the main alternatives to backward tracing based on redex trails. An evaluation 
exercise reported in [1] concluded that none of these approaches alone meets 
all the requirements for tracing, but used in combination they can be highly 
effective. This finding directly motivated the reformulation of redex trails in Hat, 
making it possible to extract equational observations and the equivalent of an 
EDT, and so to provide a multi- view tracing system [15]. The three viewing tools 
hat-detect (not described in earlier sections), hat-observe and hat-trail 
reflect the influence of Freja, Hood and the redex-trail prototype. 



7 Practical Exercises 

Exercises in this section refer to various example programs. The sources of these 
programs are available from http://www.cs.york.ac.uk/fp/afp02/. 

7.1 About merge-sort (in the Sorting directory) 

Exercise 1 Look at the simple merge-sort program in the source files Mmain.hs 
and Msort.hs. If Mmain is run with words . in as input, what lengths of list 
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arguments occur in the applications of merge in pairwise, and how many appli- 
cations are there for each length? Try to answer by inspection before verifying 
your answers using Hat. Hint: in hat-observe, either give a context to a merge 
application query or : set recursive off. □ 



Exercise 2 Examine the recursive pairwise computation. How deep does the 
recursion go? Are all equations in the definition of pairwise really necessary? 
Hint: in hat-trail, trace the ancestry of the list of strings from which the output 
is formed. □ 

Exercise 3 How many comparisons and merge’s does it take to sort a list that 
is already sorted? What about a list that is reverse-sorted? □ 



Exercise 4 Write QuickCheck properties that characterise what each function 
in the Msort module does. Check that your properties hold. What can you say 
about test coverage? □ 



Exercise 5 Look at the improved(?) version of merge-sort in Nmain.hs and 
Nsort.hs. Instead of starting the pairwise merging process merely with unit 
lists, the idea is to find the largest possible ascending and descending sublists. 
However, we have made a deliberate mistake! Find a test case where the property 
of the msort function does not hold. Can you locate and fix the bug? Do all your 
previously defined msort properties now hold? □ 

Exercise 6 How many comparisons and merge’s does the improved (and now 
correct!) merge-sort take for already-sorted input? □ 



Exercise 7 What property should the function ascending have? Check that it 
holds. How lazy is the ascends function? What happens if an element of its list 
argument is undefined? Trace the computation to see why. Can you improve the 
definition of ascends? □ 



7.2 About cryptarithmetic (in the SumPuzzle directory) 

The next few exercises are about a program that solves cryptarithmetic puzzles 
(source files SumPuz.hs and Main.hs). Inputs are lines such as SEND + MORE 
= MONEY — an example provided in the file puzzleS . in. The program has to 
find a mapping from letters to digits that makes the sum correct. Your task is to 
understand how exactly the program works, and to formulate your understanding 
in tested properties. 
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Exercise 8 Compile the program for tracing, and run it with puzzleS . in as 
input. In the process of searching for a solution, the program carries out many 
additions of two digits. The digits are candidate values for letters in the same 
column of the encrypted sum: 

SEND 
+ MORE 

What is the maximum result actually obtained from any such digit addition? The 
result occurs more than once: how many times? (Use : set all and appropriate 
application patterns in hat-observe.) Select one example of a maximal sum to 
investigate further using hat-trail. Which letters are being added and with what 
candidate values? What values are assigned to other letters at that point? Why 
does this branch of the search fail to reach a complete solution? □ 

Exercise 9 The function solutions is the heart of the program. As you can see in 
the function solve in SumPuz .hs, the standard top-level way to call the function 
solutions is with 0 as the fourth argument and [] as the fifth argument. In 
Properties .hs, we have predefined a function find that does exactly that: 

find : : String -> String -> String -> [Soln] 
find xs ys zs = solutions xs ys zs 0 [] 

In this and the following exercises we are going to write properties about this 
function find. 

The first property to define is a soundness property: the program only reports 
genuine solutions. It should say something like: 

For all puzzles, every element in the found list of solutions is arithmeti- 
cally correct. 

Check that your property holds! Remember that your task is to characterise 
exactly what kind of puzzles the program solves, and in what way. So if your 
property does not hold, use the tracing tools to understand why, and then revise 
your property (not the program) until it is correct. □ 

Exercise 10 Use a test data monitor to check how interesting the test cases 
are. For example, is a test case where there are no solutions interesting? Try 
to eliminate uninteresting tests by adding an appropriate precondition to your 
property. How does this influence the size of the tested puzzles? □ 

Exercise 11 The next property to define is a completeness property: the program 
always finds a solution if there is one. A handy way to do this is to say something 
like: 
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For all numbers x and j/, if I supply as input the digits of a;, plus, the 
digits of y, equals, and the digits oi x + y, then the list of found solutions 
should include this digit-identity. 

Again, check that your property holds. If not, use the tracing tools to understand 
why, and revise your property accordingly. □ 

Exercise 12 Hypothetically, how would you change the soundness and complete- 
ness properties if the solutions function worked in such a way that it always only 
returned one solution even if there are many? 



7.3 About Imp (in the Compiler directory) 

The final group of exercises involve testing, tracing, fixing, specifying and ex- 
tending the Imp interpreter and compiler. 

Exercise 13 Recall the QuickCheck congruence property that should hold for 
the Imp compiler and the interpreter. The version of the Imp system in the 
Compiler directory has been deliberately broken, so it does not satisfy this 
property. Indeed, it hardly works at all: try running it on gcd.in. Use Quick- 
Check and Hat to find the two bugs we have introduced. Fix them! □ 

Exercise 14 There are some functions in which we can apparently introduce 
as many bugs as we want; the congruence property will still hold! Which are 
these functions? Hint: Which functions are used both by the compiler and the 
interpreter? □ 

Exercise 15 Random testing works best if it is applied at a fine grain! Therefore, 
formulate a property that is only going to test compilation and interpretation 
of expressions. Hint: You can reuse the congruence property of programs, but 
generate only programs that print a single expression (which cannot contain 
variables). Is non-termination still an issue? □ 

Exercise 16 Now investigate the test coverage of QuickCheck for your property. 
Insert a test data monitor that checks what kind of traces are generated by 
the programs during test, and check the distribution. What do you think of the 
distribution of test data? Most generated expressions are type incorrect! Adapt 
your property by using the implication operator ==> to discard this rather large 
amount of useless test data. 

Note: To show that without this fix, your property does not have good test 
coverage, introduce the following bug: flip the order of the arguments of binary 
operators in the expression compiler. Can your old property find the bug? Can 
your new one? □ 
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Exercise 17 The original congruence property for programs has a similar prob- 
lem; the whole program crashes if the condition in an if or while statement 
is type incorrect, and this happens a lot during testing. Adapt the program 
congruence property to overcome this problem. □ 

Exercise 18 Suppose the Imp language is extended by generalising assignments 
to multiple assignments. Instead of just one variable name on the left of each : = 
there are one or more, separated by commas, and on the right an equal number 
of expressions, also separated by commas. A multiple assignment is executed 
by first evaluating all the right-hand expressions and then storing the results 
in corresponding left-hand variables in left-to-right order. Here is an example 
program (power . in) which raises 3 to the power 6: 

a, n, x :=3, 6, 1; 
while 0 < n do 

if (n\2) = 1 then n, x := n-1, x*a else skip fi; 
a, n := a*a, n/2 
od; 

print x 

By making changes in the following places, revise the Imp interpreter and com- 
piler to work with multiple assignments. 



Syntax Change the : = construction in the Command type. 

Parser Change the final alternative in nonSeqCommand. 

Hint: define listOf : : Parser a -> Parser [a] . 

Interpreter Change the := equation in the definition of run. 

Hint: generalise the definition of update. 

StackMap Change the := equation in the definition of comVars. 

Compiler Change the : = equation in the definition of compObey. 

Hint: none — we hope you get it wrong! 

Test your extension first on power . in, using Hat to investigate any faults. Revise 
the program generator in the Properties .hs so that the congruence property 
is asserted over the extended language. Apply QuickCheck and Hat as necessary 
to achieve a solution that passes an appropriate range of tests. □ 

Exercise 19 Compare the assignments: 

x, y := el, e2 and y, x := e2, el 

Under what conditions do these two assignments mean the same thing? Formu- 
late this conditional equivalence as a QuickCheck property and check that the 
property holds. □ 
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Abstract. Developing interactive Web programs poses unique problems. Due 
to the limitations of server protocols, interactive Web programs (conceptually) 
consists of numerous “scripts” that communicate with each other through Web 
forms and other external storage. For simplistic applications, one can think of 
such scripts as plain functions that consume a Web page (form) and produce a 
Web page in response. For complex applications, this view leads to subtle, and 
costly, mistakes. These lecture notes explain how to overcome many of these 
problems with a mostly functional programming style that composes scripts via 
(hidden) first-class continuations. 



1 Interactive Web Programs 

Designing an interactive Web program is a complex task. Recall how Web programs 
work. When a Web browser submits a request whose path points to a Web program, the 
server invokes the program and hands over the request via any of a number of protocols. 
It then waits for the program to terminate' and turns the program’s printed output into 
a response that the browser can display. 

The Web context has four consequences for the design of an interactive program in a 
traditional language. First, an interactive dialog with N interactions consists of roughly 
N small programs or “scripts,” because a Web program must terminate before the server 
can respond to a request. Second, these scripts must communicate. After all, an inter- 
active program must know the history of a dialog. The communication takes place via 
a variety of mechanisms (“hidden” fields in Web pages, external storage). Third, the 
dialog interface is a Web browser, which empowers consumers to jump back to any 
point in a dialog and to interleave stages of the dialog. In short, the programmer may 
not assume any execution ordering among the N scripts. Finally, the interface protocol 
(HTTP) does not allow the use of the proven model-view-control architecture. In par- 
ticular, the server cannot update a page when the state of the server changes. Hence the 
program may not assume that the consumer submits a form via a page that accurately 
reflects the current state of the server or a stateful Web program, even if there is only 
one consumer. 



* PUT is a loosely linked group of researchers who produce and use a suite of Scheme program- 
ming tools. For more information, see wwww .pit - scheme . org. 

* Some new protocols no longer demand complete program termination but something weaker; 
the problems remain the same, however. 
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I 1 

Exercises 

Exercise 1. Orbitz.com is a travel reservation site that offers flights, rental cars, hotel 
rooms, etc. It naturally invites comparison shopping. Go to Orbitz, search for a list of 
flight, hotel or car rental choices, and perform the following actions: 

1. Use the “open link in new window” option to study the details of one choice. You 
now have two browser windows available for future submissions. 

2. Switch back to the window with the choices. Inspect some other option. Place the 
browser windows next to each other so that you can perform a side-by-side com- 
parison of the two options. 

3. After comparing the options, choose the first one. That is, go to the window with 
the details of the first offering and request to buy the choice. 

At this point, you should expect that the reservation system responds with a page that 
confirms the first choice. On July 26, 2002 this was not the case, i 

Exercise 2. Find other Web sites with problems similar to those of Orbitz ’s site (see 
ex. 1) and report them to your country’s “better business bureau” for incompetent Web 
programming, i 

I I 

PLT’s solution to this dilemma is to use an enriched form of functional program- 
ming. To solve the control problem, PUT Scheme includes Web dialog functions based 
on Scheme’s first-class continuations. As a result, a Scheme programmer can write a 
single program with an arbitrary number of consumer interactions, and the program 
is (basically) safe for all consumer actions in the browser. In addition, PUT has also 
modified DrScheme, its programming environment for Scheme. The new environment 
integrates a Web server and a browser as well as rich capabilities for manipulating rep- 
resentations of XML. Hence, a DrScheme programmer can develop Web programs in 
an interactive and incremental manner, that is, in a way that is normal for a functional 
programmer. 

These lecture notes present the functional aspects of the DrScheme Web program- 
ming support. The first section discusses how to design Scheme programs in general. 
The second and third section introduce the basic ideas for interacting via the Web and 
introduces a simplified API for this purpose.^ The simple API is well-suited for teach- 
ing Web programming at the secondary school level. The fourth section explains a more 
powerful API. It requires knowledge about XML, XHTML, and DrScheme’s tools for 
manipulating those. 

Prerequisites: The paper assumes that the reader is familiar with functional program- 
ming. Ideally, the reader should also have some basic knowledge of Scheme. The last 
section requires a rudimentary understanding of (X)HTML. Some prior experience with 
Java Servlet programming or Web programming in conventional scripting languages 
such as Perl or PHP will help readers appreciate how much pain PLT Scheme removes 
and how much rationality it introduces, i 

^ Warning: Both API’s are under construction and likely to change. 
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2 How to Design Scheme Programs 

Designing programs requires a design discipline. These notes use the design discipline 
from How to Design Programs (HtDP) [1], our recently published text book for novice 
programmers. It introduces a series of design recipes. Most chapters introduce design 
recipes that lead from a data definition to a complete program. This section introduces 
the basic design discipline and some useful Scheme libraries. 



2.1 Design Recipe 

Designing a Scheme program consists of six basic steps: 

Data Analysis and Definitions: The first step is to analyze a problem statement with 
the goal of identifying all relevant classes of information. For some of this informa- 
tion, the programming language may already provide a well-suited representation. 
For others, the programmer must define a class. 

Example: If the problem concerns numerical measurements, say, money, velocities, 
distances, temperature, then Scheme’s numbers work just fine. 

Say the problem concerns a physical object such as a ball that moves along a 
straight line. It has more than one attribute but the number of attributes is fixed. 
A structure is the perfect representation for this kind of information: 

(deflne-struct ball (pos vel color)) 

;; Ball = (make-ball Number Number Symbol) 

This structure definition introduces N + 2 primitives: a constructor (make-ball), 
a predicate (ball?), and one selector per field (ball-pos, ball-vel, ball-color). A 
structure definition must come with a data definition, which defines a class of data 
and how each instance is constructed. In Scheme, a data definition is expressed as 
a line-comment (using “;”) 

Contract and Purpose Statement: The second step is to write down a concise de- 
scription of the function’s task and a contract that specifies what kind of data the 
function consumes and produces. 

Example: A currency conversion function may have the following contract and 
purpose statement: 

;; convert : Number — > Number 

;; compute the number of euros for a given amount of dollars 

The contract and purpose for a ball-moving function may look like this: 

;; move : Ball Number — > Number 

;; compute the location of a-ball that moves for t time units 
(define (move a-ball t) . . .) 

As this example shows, a programmer may wish to write down the function header 
already so that the purpose statements can refer to the parameters. 
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Behavioral Examples: Next, a programmer must illustrate the data definitions and the 
purpose statements with examples. The latter often helps clarify the computational 
process. 

Example: Here is an example of a red hall that is at location 10 (on the numherline) 
and moves to the left at a speed of 3: 

(make-ball 10 -3 ’red) 

If this hall moves for 3 time units, it arrives at location 1 : 

\\ move : Ball Number Number 

;; compute the location of a-ball that moves for t time units 
;; Example: (move (make-ball 10—3 ’red) 2) produces 1 
(define (move a-ball f) . . .) 

Eunction Template: For the fourth step, the recipe requires programmers to spell out 
all the data that is available. This includes all the parameters, and if some parameter 
is a structure, all of its fields. 

Example: The function move consumes one atomic parameter (t) and one struc- 
tured parameter (a-ball). So here is the template: 

\\ move : Ball Number Number 

;; compute the location of a-ball that moves for t time units 
;; Example: (move (make-ball 10 —3 ’red) 2) produces 1 
(define (move a-ball t) 

. . . (ball-pos a-ball) . . . (ball-vel a-ball) . . . (ball-color a-ball) ...) 

As the example shows, a template naturally reminds novice Scheme programmers 
of the data that a function can use. 

Eunction Definition: Now, after four steps of preparation, a programmer is ready to 
define the function. 

Tests: Last, but not least, every function must come with a test suite that catches all 
kinds of mistakes, from typos to logical errors. It is natural to base the test suite on 
the examples from step 2. Each test case in the test suite should specify the expected 
value, the computed value, and an equality predicate. When a test case fails, check 
for errors in the expected value and in the function definition. 

Example: For move, a simple test case like the following may suffice: 

;; move : Ball Number — *■ Number 

;; compute the location of a-ball that moves for t time units 
;; Example: (move (make-ball 10 —3 ’red) 2) produces 1 
(define (move a-ball t) 

(-1- (ball-pos a-ball) (* (ball-vel a-ball) t))) 

;; Tests: 

(= 1 (move (make-ball 10 -3 ’red) 2)) 

Figure 1 illustrates how the design recipe works for recursive data definitions, including 
heterogeneous lists. Both data definitions involve basic Scheme classes: symbols, e.g., 
’ hello, and strings, e.g., " hello " . For our purposes, strings just have a richer library of 
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Data Definition: 

;; Xml = (union String (COns Symbol (COns Att LXml))) 

;; LXmZ = (listofXm/) 

;; Aff = . . . 

Contract, Pnrpose Statement: 

;; size : Xml Number 

;; to measure the number of visible chars in an-xml 
;; (a string is visible if it is derived from String in Xml) 

(define (size a-xml) . . .) 

Xml Examples: 

;; "Hello World" 

;; (list ’p empty "This is my first paragraph." "Help!") 

;; (list ’ul empty (list ’li empty "one") (list Mi empty "two")) 

Examples for size: 

;; " Hello World " should produce 1 1 

;; (list ’p empty "This is my first paragraph." "Help!") should produce 32 
;; (list ’ul empty (list ’li empty "one") (list ’ll empty "two")) should produce 6 

Template: 

(define (size a-xml) 

(cond 

[(String? a-xml) ...] 

[else (first a-xml) . . . (second a-xml) . . . (size-for-list (rest (rest a-xml))) ...])) 

Definition: 

(define (size a-xml) 

(cond 

[(String? a-xml) (string-length a-xml)] 

[else (size-for-list (rest (rest a-xml)))])) 

The wishlist: 

;; size-for-list : (listof Xml) — > Number 
;; to count the number of characters in alox 
(define (size-for-list alox) ...) 

Tests: 

(= 11 (size "Hello World")) 

(= 32 (size (list ’p empty "This is my first paragraph." "Help!"))) 

(= 6 (size (list ’ul empty (list ’li empty "one") (list ’ll empty "two" )))) 

Fig. 1. A sample scheme program 



primitives, including the function string-length, which counts the numbers of charac- 
ters in a string. The class of Xmls consists of all strings and lists with at least two items: 
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a symbol and a member of Att. All remaining items on this list must also belong to Xml. 
The class of Atts is unspecihed for now, but includes empty (the empty list). 

The program counts the number of characters that are in strings directly derived 
from the Xfiil data dehnition. Given two mutually referential data definitions, a program 
consists of two functions: one that counts the characters in an Xml element and another 
one that counts characters in an LXml element. The remainder of the figure shows how 
a novice Scheme programmer can construct the function systematically following the 
design recipe. 

The template step deserves some additional explanation. Recall that its purpose is 
to express a data definition as code. To construct it, students use this script: 

1. Is the data definition a union? Use a cond. 

2. How many branches are in the union? Add that many clauses. 

3. Which predicate describes each branch? Write them down. 

4. Which of these predicates test for compound values? Write down the selectors. 

5. If any selection expression produces a value in some other defined set of values, 
add an appropriate function application. 

Consult HtDP [1] for details. 

2.2 Lists and S-expressions 

Two classes of data play a central role for Scheme: lists and S-expressions. Both are 
generated from a single constant, empty also known as ’()> and a single constructor, 
cons. The data definitions are as follows: 

(listof X) = (union empty (cons X (listof X))) 

(S-exp X) = (union X (cons (S-exp X) (listof (S-exp X)))) 

The data definitions for both lists as well as S-expressions are parameterized, though 
the base class is often omitted. 

The definitions show that cons consumes arbitrary values in the first position and 
only lists in the second position. DrScheme’s teaching levels enforce this restriction. 
To emphasize this perspective, the functions first and rest are used for deconstructing 
cons structures. DrScheme also provides functions such as second, third, fourth and 
so on to select specific items from a list. 

Using S-expressions, programmers can easily represent structured forms of data. 
In figure 1 , for example. Xml is a subset of S-expressions over strings, symbols, and 
Arts. In addition to processing such pieces of data, it is also important to build them. To 
this end. Scheme has four additional mechanisms for constructing lists, though none of 
them should be mistaken for primitive constructors: 

list is a function that consumes an arbitrary number of values and constructs a list from 
it. For example, 

(list 1 2 3) is short for (cons 1 (cons 2 (cons 3 empty))) 

Similarly, 

(list (list ’a 1) (list ’b 2)) is (cons (list ’a 1) (cons (list ’b 2) empty)) 
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quote is a special form. Roughly speaking, it traverses its single subexpression and 
inserts list to the right of each left parentheses and turns every token into a symbol 
if it looks like a symbol.^ For example, 

’(1 2 3) is short for (list 1 2 3) 

Similarly, 

’((a 1) (#t "two")) is (list (list ’a 1) (list#t "two")). 



quasiquote, unquote Scheme — like LISP — also has a mechanism for building com- 
plex S-expression shapes using quasiquote (or backquote) and unquote (or comma). 
A quasiquote expression is like quote except that every subexpression preceded by 
unquote is evaluated. 

Using quasiquote and unquote permits programmers to write down large tree pat- 
terns in a concise manner. Thus, this definition 
\\ make-xml : Number Number S-exp 
(define (make-xml x y) 

(local ([define 5 "An Adder"]) 

‘(html 0 
(title ,s) 

(body 0 
(h3 0 ,s) 

,(string-append 

"adding " (number-+stringx) 

" and " (number-^stringy) 

" is " (number^string (+xy))))))) 



introduces a function that creates a deeply nested X?nl datum from two numbers. 
The quasiquote expression contains three occurrences of unquoted expressions. 
The first two refer to a locally defined variable; the last evaluates an application 
of string-append to six arguments. In particular. 



(make-xml 3 4) ; produces 
’(html 0 

(title "An Adder") 

(body 0 

(h3 0 "An Adder") 
"adding 3 and 4 is 7")) 



(make-xml 7 2) ; produces 

’(html 0 

(title "An Adder") 

(body 0 

(h3 0 "An Adder") 
and "adding 7 and 2 is 9")) 



splicing Finally, on some occasions, a program may have to splice a list into another 
list. For example, a programmer may wish to produce a single (standardized) S- 
expression from a list of S-expressions like this: 

;; make-page : (listof S-exp) ^ S-exp 
(define (make-page a-list-of-paragraphs) 

‘(html 

(body 

(div a-list-of-paragraphs) 

(p "Thank you for visiting www.drscheme.org.")))) 



^ Yes, there are precise definitions. 
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The programmer could replace unquote-splicing (or ,@) with list, cons, and ap- 
pend instead of quasiquote and unquote, but this would seriously reduce the use- 
fulness of S-expressions. 



;; build-list ; natural-number (natural-number X) (listof X) 

;; to construct (list (f 0) . . . (f {— n ^))) 

(define (build-list nf ) . . . ) 

;; filter ;(X ^ boolean) (\\S\.oi X) (listof X) 

;; to construct a list from all those items on alox for which p holds 
(define (filter p alox) . ..) 

;; map (K ^ K) (listof K) ^ (listof K) 

;; to construct a list by applying/ to each item on alox 
;; that is, (map/ (list v-i . . . x-n)) = (list if x- 1) . . . (f x-n)) 

(define (map / alox) . . .) 

;;andmap.-(X ^ boolean) {Wsiof X) boolean 
;; to determine whether p holds for every item on alox 
;; that is, (andmap p (list x-1 . . . x-n)) = (and (p x-1) (and ...(/? x-n))) 
(define (andmap p alox) ...) 

;; ormap (X ^ boolean) (\\s\o( X) boolean 
;; to determine whether p holds for at least one item on alox 
;; that is, (ormap p (list x-1 . . . x-n)) = (or (p x-1) (or ... (p x-n))) 
(define (ormap p alox) ...) 

;; foldr ; (Jf K ^ K)K (listof X) ^ Y 

;; (foldr/ base (list x-1 . . . x-n)) = (f x-1 ... (f .x-n base)) 

(define (foldr/ base alox) . . .) 

;;foldl;(XK ^ K) T (listof X) ^ Y 

;; (foldl/ base (list x-1 . . . x-n)) = (f x-n ... (f x-1 base)) 

(define (foldl/ base alox) . . .) 

;; apply ; (Jf .. . ^ F)X... (listof X) ^ Y 

;; (apply/ basel . . . (list x-1 . . . x-n)) — (f basel . . . x-1 . . . x-n) 

(define (apply/ . args) ...) 



Fig. 2. Abstract iterators for lists 



2.3 Abstraction and List Iteration 

When many functions look alike, it is important to abstract. This corresponds to the step 
from the multiplication table to a multiplication algorithm. HtDP [1] naturally provides 
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a design recipe for this step, too. The abstraction recipe heavily relies on templates. In 
addition to learning when to abstract, it is equally important to recognize when to use 
existing abstractions for a given data type. This subsection presents a library of functions 
for manipulating lists and S-expressions, which are important for Web programming. 

Scheme’s library for lists and S-expressions are like those found in most functional 
languages. The major difference is that Scheme’s basic abstractions often consume a 
variable number of arguments. For example, 

(equal? (map + (list 1 2 3) (list 7 8 9) (list 12 13 14)) 

(list 20 23 26)) 

holds because map as well as + consume an arbitrary number of values. 

The key is to recognize that when a data definition contains a nested listof, it is 
useful to use a list iterator. For example, the function size in figure 1 can be reformulated 
as follows: 

(define {size a-xml) 

(cond 

[(string? a-xml) (string-length a-xml)] 

[else (apply + (map size (rest (rest a-xml))))])) 

Tests: 

(= 11 {size "Hello World")) 

(= 32 {size (list ’p empty "This is my first paragraph." "Help!"))) 

2.4 Generative Recursion 

As long as a programmer derives the template from a data definition, functions em- 
ploy structural recursion. That is, the recursions in a function’s body consume some 
immediate piece of a given compound value. In contrast, many well-known recursive 
algorithms generate an entirely new piece of data from the given data and recur on it. To 
distinguish these two forms of recursion, HtDP [1] refers to the latter kind as generative 
recursion. 

The literature on algorithms contains an abundance of examples of generative re- 
cursion: gcd, quicksort, binary search, mergesort, Newton’s method, fractals, adaptive 
integration, and so on. Similarly, many programs that interact with a consumer recur 
after generating new data via an input device. Web programming in DrScheme is an 
instance of this form of recursion, because Web programs often generate new data via 
interactions and recur without attention to the structure of the given data. In general, the 
process description in a problem often suggests when to use generative recursion. 



2.5 Finger Exercises 



Use DrScheme ’s 
Intermediate Student 
with Lambda language 
level for the exercises. 



Exercise 3. Develop which-toppings. The function consumes a Results'. 
Results = (listof (list String Boolean)) 

and produces the list of strings that are associated with true. I 
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Exercise 4. Develop the function numeric-matrix?. The function consumes an element 
of Matrix: 

Matrix = (listof (listof (union Number false)) ) 

It produces true if and only if all cells (underlined) are numbers, i 



Exercise 5. Develop the function make-table. It consumes two natural numbers: n and 
m. Its result is an Xml (see figure 1). For example, 

(equal? 

{make-table 3 2) 

’(table 0 

(tr() (td 0 "cellOO") (td () "cellOl ")) 

(tr() (td 0 "cell10")(td () "cellll")) 

(tr() (td 0 "cell20") (td () "cell21 ")))) 



Add the teachpack 
servlet2 . ss for the 
code examples and the 
exercises. 



is true. When the function is tested, use (inform/html (list (make-table 3 2))) to display 
the result of the function in a browser, i 



3 Basic Web Scripting 

Programming requires a mental model of the program’s eventual deployment context. 
This is especially important for Web programs, which interact with consumers via a 
Web browser, a GUI toolbox with many unusual capabilities. This section presents such 
a (simplistic) model of Web interactions, the API for a simple library for programming 
Web dialogs, and some basic programming techniques for working with it. 

3.1 Web Interactions 

Consider the problem of developing a currency conversion site."^ For simplicity, let us 
expect people to type in a conversion factor and the amount they wish to convert. While 
the problem is just about multiplying two numbers, it illustrates the complexities of 
Web interactions in a perfect manner. If the program uses console-based interaction, the 
design is straightforward. The program asks for two numbers, one at a time; multiplies 
them; and then displays the result. In contrast, a GUI variant of the program displays 
a window with two text fields: one for the conversion factor and one for the amount. 
The callback mechanism multiplies the two numbers and places the result somewhere 
on the window. 

For the design of an equivalent Web program, a programmer must choose whether 
he wishes to use a single Web page to inquire about the factor and the amount or two 
Web pages. At first glance, the first choice looks like the choice between a regular 
GUI program and a conventional console program. A closer look, though, reveals that 
the second choice adds a new feature. Because of the Web browser’s back button and 

This example is loosely based on Christian Queinnec’s paper on the role of continuations in 
Web interactions [5]. 
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cloning facility, the two-stage interaction dialog permits consumers to go back to the 
second question and to convert as many amounts as they wish — without any change 
in the program. Indeed, they can even clone the page, submit an amount in each, and 
compare the results. 

Unfortunately, developing a two-stage interaction in the standard CGI script or Java 
servlet world requires significantly more effort than the one-stage version. The pro- 
grammer must design three Web pages and (conceptually) two programs: 

1 . page 1 for requesting the conversion factor; 

2. a script for processing page 1 and producing page 2; 

3. page 2 for requesting the amount; 

4. a script for processing page 2 and producing the page 3; and 

5. page 3, which presents the result. 

The problem is that the two scripts (items 2 and 4) must communicate. Specifically, the 
first script must communicate the conversion factor to the second script. Since scripts 
must terminate to produce a response for the consumer, the communication must use 
external media. 

The standard Web programming world supports several mechanisms for communi- 
cating values between scripts: 

1 . data embedded in the URL; 

2. (hidden) fields in a Web form (the query part of a Web page); 

3. server-side state (files, databases, session objects); and 

4. client-side state (cookies). 

Programmers typically have difficulties choosing the proper medium for multi-stage di- 
alogs. The choice is indeed difficult, because it means choosing between an environment 
and a store, using a programming language perspective. Given the lack of programming 
language training, it is not surprising that programmers often make the wrong choice. 
Worse, even if they make the proper choice, they do not write down the invariants that 
govern this communication channel, because it is outside of the program proper. It thus 
becomes difficult to check and maintain such invariants. 

3.2 The Web and the Console 

In principle an interactive Web program should have the same structure as a console 
program. For example, the currency converter should consist of nothing but two requests 
for numbers, an addition, and an output form. 

DrScheme’s simple Web API enables just this kind of programming: 

(inform " CurrencyConversion.com " 

(number^string 

(* (string-^number (single-query "Enter a conversion factor")) 
(string-^number (single-query "Enter an amount"))))) 

The PUT Web server implements the function single-query, which establishes a channel 
between the consumer and the program. The function sends a question (string) to the 
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browser, suspends the Web program, and hands control back to the server. When the 
server receives the response for this particular query, it resumes the Web program at the 
proper place. The consumer’s input (also a string) becomes the value of the application 
of single-query. 

Technically, the PLT Web server automatically turns an interaction point into a clo- 
sure that the consumer can call as often as desired. For example, the server turns the 
query about the conversion factor into the following function: 

;; query-pointl : Number — s- true 
(define {query-pointl conversion-factor) 

(inform "CurrencyConversion.com " 

(number->string 

(* (string^number conversion-factor) 

(string^number (single-query "Enter an amount")))))) 

The second interaction point becomes a true closure: 

;; query-point2 : Number — > true 
(define query-point2 

(local ([define conversion-factor . . . ]) 

(lambda (amount) 

(inform " CurrencyConversion.com " 

(number-^string 

(* (string-^number conversion-factor) 

(string-^number amount))))))) 

Here the dots stand for the number to which query-pointl was applied. Every time the 
consumer responds to the query about the conversion factor, the server applies query- 
pointl to the consumer’s input; every time the consumer responds to the query about the 
amount, the server applies query-point2 . Thus it is perfectly acceptable if the consumer 
clones a page in the currency conversion dialog, backtracks, or performs any other kind 
of whimsical navigation within the Web dialog; it is all just function application. 

Figure 3 graphically illustrates how the DrScheme environment supports the de- 
velopment of interactive Web programs. When a programmer evaluates an expression 
that performs Web interactions, the environment opens a Web browser and enables the 
programmer to interact with the program via the browser. Furthermore, a programmer 
can also use DrScheme ’s stepper to inspect the standard reduction sequence for the pro- 
gram. The lower left part of the image shows the reduction of the first query and how it 
is replaced by 33, which is the input in the browser’s text field (upper right). 

3.3 The Simple Servlet API, Take 1 

Our first version of the currency conversion program is seriously flawed. It provides 
the consumer with a simple text field and then attempts to convert the string input to a 
number. If the consumer submits a text that does not represent a number, the program 
fails. Since the goal to obtain a particular kind of value from the consumer is quite 
common, DrScheme’s simple servlet API supports a variety of consumer queries, which 
all result in well-formed values. 
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Fig. 3. Developing interactive web programs in DrScheme 



Figure 4 specifies the simple API for PLT Scheme servlets. The first data dehnition 
enumerates the kinds of queries that single-query can post. A plain string is turned into 
a question with an ordinary text field for submitting responses. A password structure 
requests a passphrase, a number structure a number, and a boolean structure an on- 
off check box. A yes - no structure poses a Yes-or-No-type question with two possible 
phrases as responses. For example, a Web program may ask 

{single-query 

(make-yes-no 

"Do you accept the agreement?" " I accept. " "I decline.")) 

The response is either " I accept. " or " I decline. " , but nothing else. 

The design of the currency conversion program can take advantage of this API: 

{inform " CurrencyConversion.com " 

(number— ^string 

(* {single-query (make-number " Enter a conversion factor")) 
{single-query (make-number "Enter an amount"))))) 

Once “typed” queries are used, both applications of single-query produce a number. If 
the consumer enters anything else, single-query repeats the query until the consumer 
enters a number in the query-associated text field (or gives up). 
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Queries and their visual counterparts: 



Query = (union 



XHTML element 
question with text field 
question with password field 



String 

(make-password String) 
(make-number String) 
(make-boolean String) 
(make-yes-no String String String) 

...) 



question with text field 
question with check box 



question with 2-way radio 



Queries and responses: 
Query = (union 



Response = (union 



String 



String 

String 



(make-password String) 
(make-number String) 
(make-boolean String) 



Number 



Boolean 

(union String:y String:n) 

...) 



(make-yes-no String String.y Stringm) 
...) 



Functions: 

;; single-query: Query Response 
;; send a query, suspend the current program, and 
;; produce for a response that matches q 
(define {single-query q) . . .) 

;; inform : String String *-> true 

;; post some information on a Web page, wait for continue signal 
(define (inform title textl ...)...) 

y, final-page : String String *-> true 
;; post some information on a Web page, 

;; shut down the servlet and all its continuations 
(define (final-page title textl ...)...) 



3.4 From Queries to Dialogs 

Suppose a store needs a site that allows a consumer to log into an account. The login 
page asks visitors for a name and a passphrase until the two match according to some 
database. More specifically, the site requests a name and password, checks them accord- 
ing to some database, and if they match, enables the consumer to access some private 
information. Otherwise, it informs the consumer of the mismatch and asks again for the 
login information. 

The need for a comparison suggests an additional function: 



Fig. 4. The servlet2 . ss teachpack interface, take 1 
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matching? : DB String String String 
;; to determine whether the user-name and the password match 
(define {matching? a-db user-name password) . . .) 

The function is applied to a plain query, a password query, and a database; 

{matching? the-db 

{single-query "User Name") 

(single-query (make-password " Password"))) 

The word “again” in the problem description naturally suggests a main function based 
on generative recursion: 

;; login : String — > String 

;; display a greeting with link to login page 

;; determine whether the user-name and the password match, 

;; if so produce the user name; if not, try again . . . 

(define {login greeting) 

(local ([define juseless- {inform "Login Page" greeting)] 

[define user {single-query "User Name")] 

[define pass {single-query (make-password " Password"))]) 

(cond 

[{matching? the-db user pass) user] 

[else {login "The name/password didn’t match. Try again")]))) 

;; Run, program, run: 

{login "Welcome to our wonderful e-store. ") 

The argument to the function represents the greeting to be used. The initial greeting 
is a generic welcome phrase; if the consumer fails to provide a valid name-password 
combination, the greeting is a warning about the mismatch. The result of the function 
is the consumer’s name, which the next stage in the dialog may use to display a specific 
greeting, including advertisement. 

I 1 

Exercises 

Exercise 6. Develop the function range-query. It consumes two numbers, low and high. 
Its purpose is to request from the consumer a number in this range. The result is the first 
consumer-supplied number in the specified range, i 

Exercise 7. Develop a “suspend and resume” service for the Cherwell, the independent 
Oxford University paper. The purpose of the service is to enable subscribers to sus- 
pend the delivery of their newspaper for a vacation. The service should interact with 
consumers as follows: 

1 . log in (provide name, subscription number) 

2. choose a date for suspending the newspaper delivery (day, month, year) 

3. choose a date for resuming delivery 

4. print page with information, ask to continue if they wish to confirm 
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Hint: The program should ensure that the dates are ordered properly, but for now define 
a date comparison function that always produces true. I 

Exercise 8. Implement a Web program that plays “Who Wants To Be A Millionaire?” 
with a consumer. Prepare a list of ten questions. Each question comes with some re- 
sponses, one of them the correct one. The program poses a question and its possible 
responses. As long as the consumer chooses the correct response, the game continues. 
When the consumer submits an incorrect response, the program terminates and the con- 
sumer has lost. When there are no more questions, the consumer has won. i 
I I 

4 Intermediate Web Scripting 

The servlet API in figure 4 forces Web programmers to obtain one piece of information 
at a time. Thus, if the program needs a person’s first and last name, it must conduct 
two queries. Similarly, if the program needs to find out which color a consumer likes 
best, it must interact via a text field or a nested sequence of yes -no questions. None 
of these tricks is satisfactory, however. To avoid them, the simple servlet API provides 
additional machinery 

4.1 Radio Choices 

Consider the problem of finding out a consumer’s preferred choice of color for some 
product. Suppose the product comes in five colors: green, red, blue, white, and black. 
The choices naturally make up a list, and the consumer must pick exactly one of the 
items on the list. 

The top half of figure 5 introduces a new kind of query that serves exactly this 
purpose. Like the yes -no query, the radio query consists of a question and some 
given responses. Instead of two possible responses, though, the radio structure can 
pose an arbitrary number of them. With the radio structure, choosing a color is no 
longer a problem: 

{single-query 

(make-radio 

"Please choose a color from the following list" 

(list "green" "red" "blue" "white" "black"))) 

The result of this application is one of the five colors. 

I 1 

Exercises 

Exercise 9. Modify the “suspend and resume” service for the Cherwell from exercise 7. 
Use radio structures to request dates, so that consumers can’t make silly mistakes, i 

Exercise 10. Develop a Web page that “sells” t-shirts that come in four sizes, i 

I I 
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Queries and their visual counterparts: 




Query = (union 


;; XHTML element 


(make-radio String 

(cons String (listof String)))) 


;; question with n > 1 
;; mutually exclusive responses 


Queries and responses: 




Query = (union 


Response = (union 


(make-radio String loString: choices)) loString:choices) 


Form queries and tabular responses: 




Formltem = 

(list Symbol Query) 

Form = 

(cons Formltem (listof Formltem)) 


Tableltem — 

(list Symbol Response) 

Table = 

(cons Tableltem (listof Tableltem)) 


Functions: 

y, form-query : Form — > Table 
;; posing a list of questions with tag 




;; extract/single : Symbol Table —* Response 
;; extracting the response for a given tag; 

;; the tag must occur exactly once 




;; extract : Symbol Table (listof Response) 

;; extracting all the responses for a given tag; 

;; the list is empty if there is no such tag 




Fig. 5. The simple servlet api, take 2 



4.2 Forms and Tables 



Consumers do not like to wait for Web pages. Web programs therefore should avoid 
sending more pages than necessary for a proper implementation of the desired pro- 
gram functionality. Because of this constraint the simple servlet API also includes/orm- 
query. Unlike single-query, this function can pose a number of queries on a single page. 
It does not just consume a list of queries, however. Instead, the function consumes a 
form and produces a table. A form is a list of couples. Each couple is a list of two val- 
ues: a symbol, which plays the role of a tag, and a query. A table is a list of couples, 
that is, tagged responses. The tags in either a form or a table make up its domain. Given 
a form f, form-query produces a table that has the same domain as/. 

Take a look at this example: 
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(form-query 

(list 

(list ’first "first name") 

(list ’last "last name") 

(list ’yob "year of birth"))) 

This query requests three pieces of information at once: a consumer’s first name, last 
name, and year of birth. 

The tags in a form don’t have to be unique. For example, the program fragment in 
figure 6 asks four questions, three of them labeled ’likes. It counts how many of those 
radioboxes the consumer checked and confirms the tally. 



;; like : String (list ’likes gnery) 

(define (like long) 

(list ’likes (make-boolean (String-append "Do you like " long " ?")))) 
;; Form 

(define language-query 

(cons ’(name "Name:") (map //fe ’("Scheme" "ML" "Haskell")))) 
(inform 

"Okay, so you like " 

(number-^ string 
(length 

(filter (lambda (x) (and (eq? (first x) ’likes) (second x))) 
(form-query language-query)))) 

" languages.") 



Fig. 6. A form query 



The simple servlet API provides two functions for extracting items from tables. The 
first one, extract/single, finds the response that is associated with a unique tag; it raises 
an exception if the table contains no response or more than one response labeled with 
the given tag. The second one, extract, finds all the responses associated with a given 
tag. If there aren’t any, it produces empty. 



4.3 From Forms to Dialogs 

Forms and tables greatly facilitate the development of serious Web dialogs. Consider 
a dialog for registering participants for the summer school on Advanced Functional 
Programming. It is important for the summer school management to obtain some basic 
information from all the participants — students and speakers — and to determine which 
services they need. 

Let’s be more concrete: 
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;; welcome ; true ^ true 
(define {welcome x) 

{inform "Sign-up Page" welcome-page)) 

(define welcome-page "Welcome to the AFP Sign-Up Page ... ") 



;; basic-info : true ^ Table\first, last, email, speaker] 

;; request basic info from participant; ensure that ’email is an email address 
(define {basic-info go) 

(local ([define basic {form-query info-page)]) 

(eond 

[{conforming? {extract/single ’email basic)) basic] 

[else . . . {basic-info true) . . . ]))) 

(define info-page ;; Form[[\tS\.,last,email,speaker] 

(list (iist ’first "First Name") 

(iist ’last "Last Name:") 

(list ’email "Email address:") 

(list ’speaker (make-boolean "Are you a speaker?" )))) 



;; determine-options : Table[X] Table[X,housing,meals,banquet] 

;; dispatch to additional page per option, append all options chosen 
(define {determine-options contact-info) 

(if {extract/single ’speaker contact-info) 

(append conracMn/o ’((housing "prime") (meals "full") (banquet "yes "))) 
(local ([define t {form-query option-page)]) 

(append contact-info 

{get-housing-choices {extract/single ’housing t)) 

{get-meal-choices {extract/single ’meals t)) 

(list (list ’banquet {extract/single ’banquet f))))))) 

{Ae&nz option-page ;; Form[housing,meals, banquet] 

(list (list ’housing (make-boolean "Do you need housing?")) 

(list ’meals (make-boolean "Do you wish to participate in the meal plan?")) 
(list ’banquet (make-boolean " Do you wish to buy a banquet ticket? ")))) 



;; confirm-info+choices : Table — > true 
(define {confirm-info-\-choices t) 

{inform "Confirmation" . . . {extract/single ’first t) . . .)) 



Fig. 7. A simple AFP workshop registration site 



1 . The Web program should gather basic information such as first name, last name, 
email address, and whether the participant is a speaker. Speakers automatically get 
a first-class room-and-board package. 
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2. Next the program asks the participant whether he is interested in on-campus hous- 
ing, a meal plan, and the banquet dinner. 

3. Depending on what the participant requested, the program requests more informa- 
tion. In particular, 

(a) if the participant opted for on-campus housing, he should now choose from 
prime rooms, single rooms, or double rooms. 

(b) if the participant opted for a meal plan, management needs to know whether he 
would like both lunches and dinners or just dinners. 

Finally, the program should summarize the chosen options and ask for a hnal commit- 
ment from the participant. A realistic version should also compute the total cost and 
request payment by a certain date. 

The explanation suggests that this particular dialog consists of a collection of ques- 
tions that can be arranged in a dag (directed acyclic graph). One natural translation of 
the dag into a program is a composition of functions (nested expression), and each such 
function in turn may consist of additional function compositions: 

(confinn-info+choices {determine-options {basic-info {welcome . . . )))) 

This expression first welcomes the participant to the Web site, collects basic informa- 
tion, determines what other information it should collect, and finally requests a confir- 
mation. 

Composing functions requires that they have a common type. For a dialog such as 
this one, think of the data as a record of information about the participant. Each stage 
of the dialog acquires new information from the participant and the record grows as it 
flows through the functions. Since form-query produces Tables and since a Table is a 
convenient representation for an ever-growing record, it is natural to use Table as the 
connecting type between the functions. The contracts for basic-info, determine-options, 
and confirm-info-\-choices in figure 7 reflect this discussion. The domain of basic-info is 
true, because it simply waits for a signal from the browser that the potential participant 
has entered the site. For symmetry, the range of confirm-info-\-choices is true, because 
the result is simply a signal that the dialog has terminated properly. 

For a first instructive example, take a look at basic-info in figure 7. The function 
comes with an auxiliary Form definition that specifies the form with which it gath- 
ers the necessary information. The function gathers information with form-query and 
information-page . It follows from the contract fox form-query that base is a table with 
the domain^r^r, last, email, and speaker. 

The second instructive example is determine-options, whose contract shows that the 
function adds three fields to the given table: ’housing, ’meals, and ’banquet. To do 
so, the function exploits the information in the table that it consumes. In particular, the 
sign-up program shouldn’t ask a speaker about housing and meal choices; management 
will pay for all expenses. Similarly, a participant who wishes not to stay on campus 
should not need to answer any questions about housing. To accomplish this, determine- 
options uses form-query auxiliary dialogs and even two auxiliary functions: 

;; get-housing-choices : Boolean — > Table[housing] 

;; determine what kind of housing the participants wishes to rent 
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;; get-meal-choices : Boolean — > Table[meals] 

;; determine what kind of meal plan the participants wishes to huy 

Both consume data to indicate whether a participants wishes to hook on-campus hous- 
ing and meals, respectively. Both produce a table, with an appropriate domain, which 
is then appended to the information about the participant. In short, determine-options 
produces its table by composing partial dialogs and tables. 

I 1 

Exercises 

Exercise 11. Complete the AFP registration program, i 

Exercise 12. Develop the function choose-toppings. It consumes a list of strings (pizza 
toppings) and presents all of them as boolean queries on a single Web page. It produces 
the list of all checked choices and makes sure that it contains at least one item, i 

Exercise 13. Modify the “suspend and resume” service for the Cherwell from exer- 
cise 9. Use forms, and form-query so that the appropriate pieces of information are 
requested on a single page, i 

Exercise 14. Develop a pizza ordering site where customers in a store’s database can 
login and order a pizza. The site provides the following pages: 

1 . login: request name and password, also ask whether delivery is desired; 

2. pizza specifications: determine the pizza that the consumer would like: 
kind Chicago, new york, California, hawaii; 

size small, medium, large, texas; 

toppings eggplant, banana, broccoli, anchovies, eggs. 

3. if the consumer chose the delivery option, request delivery information now; as- 
sume that the store only needs the street address and the zip code (a number with 5 
digits); 

4. confirmation: print summary of all information, including the cost of the pizza; 
once the consumer proceeds, the pizza is ordered and the servlet terminates. 

The cost of a pizza is determined as follows. The kind determines the base cost; a 
Chicago style pizza is 6.95, all others are 5.95. The size adds some cost: for a medium 
add $1, for a large $5, and for a “texas” add $7. Finally, the first topping is free, the 
others cost $.50 per topping. The money is due upon pickup or delivery, i 

I I 

5 Advanced Web Scripting 

The simple servlet API provides only a mechanism for requesting information via sim- 
plistic forms. Full-fledged, interactive Web programs, however, require finer forms of 
control than that. Such programs must be able to determine the layout of a page; a single 
page may have to provide different points where control can return to the program; and 
the program may have to use other media than pages to conduct a dialog (e.g., email). 
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For these reasons, PLT Scheme provides the servlet. ss library, a low-level vari- 
ant of servlet2 . ss.The servlet . ss library provides the kind of power that full- 
fledged Web programs need. This section introduces servlet . ss; it assumes some 
basic knowledge of XHTML. The first subsection describes a representation of XHTML 
in Scheme and its visual representation within DrScheme. The second subsection intro- 
duces the functions for interacting with consumers. The last subsection illustrates how 
to design interactive programs with these tools. 



5.1 X-expressions 



Designing full-fledged Web programs requires the design of Web pages and associating 
code with these pages. To this end, PLT Scheme provides an embedding of XHTML — 
the XML dialect of HTML — into Scheme. Since an XML dialect such as XHTML is a 
“fully parenthesized” language for data, it is easy to define a subset of S-expressions as 
a representation of XHTML. This is known as the set of X-expressions (short: X-exp). 

Figure 8 specifies the current mapping from XHTML (indeed XML) to a subset of 
S-expressions. PLT Scheme also provides a library of functions for parsing XML into 
X-expressions and for printing X-expressions as XML, though because the servlet API 
takes care of these mappings, there is no need for a servlet programmer to know about 
these functions. 

For a good understanding of X-expressions, it is best to consider a series of exam- 
ples. Here is a canonical Web page and the X-expression that produces it: 



<html> 

<head> 

<title> 

"My first Web page" 
</title> 

</head> 

<body bgcolor= " red" > 

<h3>My first Web page</h3> 
<p /> 

<p>"Hello World"</p> 
</body> 

</html> 



’(html 

(head 

(title 

"My first Web page")) 



(body ([bgcolor " red " ]) 
(h3 "Finger Gateway") 

(P) 

(p "Hello World"))) 



The page has a head element and a body element. The body has one attribute: a back- 
ground color. The example shows how convenient it is to write down large XHTML 
constants with quote. 

The real power of Scheme’s X-expressions, though, is derived from backquote, un- 
quote, and splicing. Here is a generalization of the first Web page: 
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;; n-th-web-page : String — > X-exp 

(define {n-th-web-page n) 

(local ([define m/e (string-append "My " « " Web page")]) 

‘(html 

(head (title ,title)) 

(body ([bgcolor "red"]) 

(h3 ,title) 

(P) 

(p "Hello World"))))) 

Using n-th-web-page, it is now easy to produce the first, second, third and so on Web 
pages. 

To use X-expressions effectively for Web programming, it is still necessary to un- 
derstand XHTML. In particular, since forms often need some visually appealing layout, 
a Web programmer should know the ins and outs of XHTML’s tables. Furthermore, 
servlet programming requires a thorough understanding of forms, a subset of X-exp. A 
form is an XHTML element that enables consumers to supply information. By sending 
a form to a browser, a program can request texts, choices, passwords, etc. The bottom 
half of figure 8 specifies the representation of XHTML forms in Scheme. Although a 
form may contain arbitrary X-expressions, its main purpose is to gather information 
from the consumer via input elements. 

An input element comes with two important attributes: ’type and ’name. The 
’name attribute is explained in the next subsection; the ’type attribute specifies what 
kind of input the element gathers. A " text " element is rendered as a text field. A " pass- 
word" element is a text field that hides what a consumer types. A "check" element 
denotes an on-off selection box. A " radio " element is a box that presents several mutu- 
ally exclusive choices; all radio elements with the same name are related. A "submit" 
element is rendered as a button; clicking on the button sends all the information that the 
consumer specified in the input elements of a form to the program that is named in the 
’action attribute of the form. 

Figure 9 contrasts an XHTML page with an embedded form and its Scheme repre- 
sentation. The form contains two input fields without any code for arranging them. One 
asks for a name, and the other is a button that submits the name. 

Figure 10 shows how a form is integrated into Web programs. By convention, the 
form is the result of a function that consumes a string (for the ’action) attribute and 
produces an X-expression. The figure also illustrates how programmers can co-mingle 
XHTML and Scheme directly. This programming environment functionality is espe- 
cially useful for maintaining pieces of XML that contain large string constants. 

5.2 The Full Servlet API 

Figure 1 1 specifies the full API for PLT Scheme servlets. It provides two functions 
for interacting with the browser: send/finish and send/suspend. The former consumes 
an X-expression, sends it to the browser, and shuts down the servlet and all related 
query points. The latter consumes a function — called Suspender — which builds an X- 
expression (typically, but not necessarily a form) from a string that represents the cur- 
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X-exp; 




X-exp = 


;; XML element 


(union 


;; 


String 


;; words 


(cons Symbol (COns Attributes (listof X-exp))) ;; element with attributes 


(cons Symbol (listof X-exp)) 


;; element without attributes 


Symbol 


;; symbolic entities such as nbsp; 


Number) 


;; numeric entities like &#20; 


Attributes = 


;; Attributes 


(listof (list Symbol String)) 


;; tags and attribute values 


X-exp Forms: 




F-expr = 


; forms 


(cons ’form 


; 


(cons (\\sXol F-attributes) 




(listof X-exp))) 




F-attribute = 


; attributes for forms 


(list 




(list ’method (union "get" "post")) 
(list ’action URL-String)) 




F -input — 


; input element 


(list ’input /-atfnfoMfei) 




I-attributes = 


; attributes for inputs 


(list 




(list ’type I-type) 
(list ’name String)) 


; 


I-type = 


; values for the type attribute 


(union 




"text" 


; a text field 


" password " 


; a password text field 


"check" 


; a check menu (turned on/off) 


"radio" 


; a radio menu (mutually exclusive) 


"submit") 


; a button 


Fig. 8. PLT Scheme’s XML representation 



rent continuation. The X-expression is then sent to the browser and the servlet is sus- 
pended. When the consumer submits a response, the servlet execution resumes at the 
point where it was suspended; the consumer’s response becomes the value of the appli- 
cation of send/suspend. 
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<html> 

<head> 

<title> 

Finger Gateway 
</title> 

</head> 

<body> 

<h3>Finger Gateway</h3> 
<formaction= "http;// . . . 
method= "post " > 

< input type= " text " 

name="who" /> 

< input type= " submit " 

value=" FINGER" /> 

</form> 

</body> 

</html> 



’(html 

(head 

(title 

"Finger Gateway")) 



(body 

"Finger Gateway" 

(form ([action "http://... "] 
[method "post"]) 

(input ([type "text"] 

[name "who"])) 

(input ([type "submit"] 

[name "FINGER"]))))) 



Fig. 9. An xhtml page & form and its Scheme representation 



R DrScheme File Edit Show Language Scheme S 



e 

Ifinqer.ssI 

^ 

;; String -> X-expr[htinl] 



finger. ss ~ DrScht 



ii prcoenl a form for submitting a none to finger 
(define (finger-page action-url) 

'(html 

(title "Finger Gatetuay 2")| 

(body 

"F inger Gateiuay 2" 

(form ([action , action-url] [method "post"]) 
“Uho do you uiish to finger? " 

(input ([type "text"][nome "mho"))))))) 



;; (listof X-expr) -> String -> X-expr[html] 
j; present a list of X-exprs and prouide next link ] 
(define (result-page lines) 

(lambda (action-url) 



<htnl> 

<div> 






<> 




.9 

1 i nss 










<p> <a href= 


act i on-ur 1 


A 

a 

V 
A 
ni 

V 
X 

z 

A 


</div> 

</html> 









Fig. 10. XML boxes in DrScheme 
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Servlet Types; 

Suspender = ; 

(String — > X-exp) ; 


produces a Web page from a url 


Request — 

(inake-response . . . Bin . . . ) ; 


represents a consumer’s reply 


Bin = ... ; 


... it contains a set of bindings 


Functions: 

;; send/suspend : Suspender Request 

;; send a Web page and suspend 




;; send/finish : X-exp Request 

;; send a Web page and finish off the servlet 




;; build-suspender : (listof X-exp) (listof X-exp) Suspender 

;; build a suspender from title wA form-body 
;; also consumes two optional values that specify attributes 
(define (build-suspender title form-body) . . .) 


;; extract-binding/single : Symbol Bin — > String 
;; extracting the response for tag\ 

;; the tag must occur exactly once 
(define (extract-binding/single tag b) . . .) 




;; extract-bindings : Symbol Bin —r (listof String) 
;; extracting all the responses for a given tag; 

;; the list is empty if there is no such tag 
(define (extract-bindings tag b) . . .) 




Fig. 11. The servlet api 



The response from the consumer is a request for the program to continue. PLT 
Scheme represents the response with a Request structure, whose only important field 
is a set of bindings (Bin). The server/browser constructs these bindings from the ’name 
attributes and the current values of the input fields. For example, if a consumer clicks 
on one of several buttons, the name of that button is in the request that the browser 
submits, and the name of no other button shows up. As figure 1 1 shows, bindings can 
be accessed with two selector functions, which are similar to (but not the same as) 
extract and extract/single from servlet2.ss. 
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5.3 Exploiting the Full API 

Based on the full servlet API, a programmer can implement a number of new Web 
interactions. To illustrate this point, consider the problem of presenting a number of 
choices and enabling the consumer to view the details of these choices. Many Web sites 
present lists of choices as lists of links. Each of the links leads to a page that presents 
the details of a choice. A Web program that presents such a list of links must therefore 
create different links, yet, all links must also invoke a program that is aware of the 
preceding dialog. In XHTML, such a link has the shape 

http : / / WWW. dr scheme . org/ servlets/ test . ss ?choice=I 

The underlined part is called a query. A server parses the query into a set of bindings. 
Using queries, a browser can thus submit arguments and a Web program can distinguish 
among otherwise equal URLs. 

Generating such URLs is straightforward: 

;; present-a-choice : URL-String String — > String 
;; generate a paragraph’ ed link to a “choice url” 

(define (present-a-choice url choice) 

‘(p (a ([href ,(string-append url "?choice=" choice)]) ,choice))) 

The function consumes two strings: a base url and a word that identifies a choice. The 
choice becomes simultaneously a part of the final URL in an a XHTML element and 
the visible part of the link. 

The program in figure 12 uses this technique to deal with choices and their details. 
The function first-page generates a list of choices: GEN and AFP. Each of them is a link 
produced by present-a-choice. When a consumer clicks on one of the choices, second- 
page identifies the choice, presents the details, and waits for a confirmation. Einally, 
after the consumer confirms a choice from a detail page, last-page confirms the choice 
and shuts down the servlet. 

I 1 

Exercises 

Exercise 15. Conduct the Orbitz experiment (see ex. 1) with the choice program, i 

Exercise 16. Develop a simplistic spreadsheet with the full servlet interface. The site 
consists of three pages whose purpose is to 

1 . ask the consumer for a number of rows and columns: n, m. 

2. display an n x m matrix of text fields and request numbers. 

3. compute the sums of all rows and columns and display the result as a table of 
(n + 1) X (m + 1) squares. The page should also offer three choices for continuing: 
update, re-start, and log out. The first re-computes the sums, presumably after a 
modification to a cell. The second and third are self-explanatory, i 

Exercise 1 7. The final project is to develop a workshop sign-up page for the Advanced 
Summer School on Functional Programming. For details on the summer school’s sign- 
up process, please see 



Add the teachpack 
servlet . ss. 



Use DrScheme’s PLT 
Pretty Big language 
level for this exercise. 
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first-page : true Bin 
;; present some choices 
(define (first-page t) 

(request-bindings 

(send/suspend 
(lambda (url) 

‘(div ,(present-a-choice url "GEN") ,{present-a-choice ml "AFP ")))))) 

;; second-page : Bin — > Bin 

;; show details, well, just show what the person chose 
(define (second-page b) 

(local ([define _ 

(inform "Choice" "You chose " (extract-binding/single ’cho\ce b) " 
"Piease click on the link beiow to confirm your choice.")]) 

b)) 

;; last-page : Bin empty 
;; confirm the choice 
(define (last-page b) 

(final-page "Thank you" 

"Thank you for for choosing " (extract-binding/single ’cho\ce b) " ." 

"We know you have a choice of summer schoois, " 

"and we appreciate your business. ")) 

;; run, program, run: 

(last-page (second-page (first-page true))) 



Fig. 12. A Web program for dealing with choices 



http://web.comlab.ox.ac.uk/oucl/research/areas/ap/afp/registration.txt 
which is a text version that people needed to fax in. 

Hint 1: In PLT’s experience, workshop participants often misspell email addresses. To 
confirm that an email address works properly, the Web program should send the url (for 
the continuation) to the consumer who is signing up. The consumer can then use this 
link to resume the sign-up process. To send email, use the smtp-send-message function 
from the smtp library (search the Help Desk for details) 

Hint 2: Write the information for a participant to a file in S-expression syntax. Use 

(with-output-to-file " participant-information " 

(lambda () (display (list " NM " ...))) 

’append) 

(see Help Desk) for this purpose. The expression appends an S-expression to the file 
" participant-information " in one atomic step. 

Finally, the ambitious student may also provide a Web program that enables the 
workshop secretary to view and possibly edit the participant files via a Web dialog, i 

I I 
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The first item in the following list is PLT’s text book on designing programs; the re- 
maining articles deal with designing interactive Web programs: 

1. Felleisen, M., R.B. Findler, M. Flatt, S. Krishnamurthi. How to Design 
Programs. MIT Press, 2001. 

2. Graunke, R, R. Findler, S. Krishnamurthi, M. Felleisen. Automatically 
restructuring programs for the web. In Proc. 2001 Automated Software Engineer- 
ing, 2U-222. 

3. Graunke, R, S. Krishnamurthi, S. van der Hoeven, M. Felleisen. Pro- 
gramming the web with high-level programming languages. In Proc. 2001 Euro- 
pean Symposium on Programming, 122-136. 

4. Hughes, J. Generalizing monads to arrows. Sci. Comp. Prog. 37, 2000. 67-1 11. 

5. Queinnec, C. The influence of browsers on evaluators or, continuations to pro- 
gram web servers. In Proc. 2000 International Conference on Functional Program- 
ming, 2000, 23-33. 

6. Thiemann, P. WASH/CGl: Server-side web scripting with sessions and typed, 
compositional forms. In Proc. 2002 Practical Applications of Declarative Lan- 
guages, 192-208. 

Like these lecture notes, these articles deal with the control problem that arise from the 
use of Web browsers for interacting with consumers in a proper manner. 
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Abstract. In these lecture notes, we give an overview of concurrent, 
distributed, and mobile programming using JoCaml. JoCaml is an ex- 
tension of the Objective Caml language. It extends OCaml with support 
for concurrency and synchronization, the distributed execution of pro- 
grams, and the dynamic relocation of active program fragments during 
execution. 

The programming model of JoCaml is based on the join calculus. This 
model is characterized by an explicit notion of locality, a strict adher- 
ence to local synchronization, and a natural embedding of functional 
programming a la ML. Local synchronization means that messages al- 
ways travel to a set destination, and can interact only after they reach 
that destination; this is required for an efficient asynchronous implemen- 
tation. Specifically, the join calculus uses ML’s function bindings and 
pattern-matching on messages to express local synchronizations. 

The lectures and lab sessions illustrate how to use JoCaml to program 
concurrent and distributed applications in a much higher-level fashion 
than the traditional threads-and-locks approach. 



1 An Overview of JoCaml 

Wide-area distributed systems have become an important part of modern pro- 
gramming, yet most distributed programs are still written using traditional lan- 
guages designed for closed, sequential architectures. In practice, distribution is- 
sues are typically relegated to system libraries, external scripts, and informal de- 
sign patterns [4, 19], with little support in the language for asynchrony and con- 
currency. Conversely, the distributed constructs, when present, are constrained 
by the local programming model, with for instance a natural bias towards RPCs 
or RMIs rather than asynchronous message passing, and a tendency to hide these 
issues behind sequential, single-threaded interfaces. 

Needless to say, distributed programs are usually hard to write, much harder 
to understand and to relate to their specifications, and almost impossible to 
debug. This is due to essential difficulties, such as non-determinism, asynchrony, 
and node failures. Nonetheless, it should be possible to provide some language 
support and tools to facilitate distributed programming. 



J. Jeuring and S.P. Jones (Eds.): AFP 2002, LNCS 2638, pp. 129—158, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 



130 



C. Fournet 



JoCaml is an attempt to provide such a high-level language in a functional 
setting, with linguistic support for asynchronous, distributed, and mobile pro- 
gramming [6, 15] . JoCaml is based on the join calculus [7, 8], a simple and well- 
defined model of concurrency similar to the pi calculus [22, 21] but more suitable 
for programming. The join calculus is the core language for JoCaml and its pre- 
decessors, and has inspired the design of several other languages [3, 23, 27, 28]. 
More formally, the join calculus is also a specification language, that can be 
used to state and prove the properties of programs, such as the correctness of 
an implementation [10, 1]. 

JoCaml is an extension of Objective Caml 1.07 [20], a typed programming 
language in the ML family with a mix of functional, imperative, and object- 
oriented features. JoCaml extends OCaml, in the sense that OCaml programs 
and libraries are just a special kind of JoCaml programs and libraries. JoCaml 
also implements strong mobility and provides support for distributed execution, 
including a dynamic linker and a distributed garbage collector. 

These notes give an overview of JoCaml, and how it can be used for both con- 
current and distributed programming. First, we survey the basic ideas behind 
JoCaml as regards concurrent and distributed programming. Then, we intro- 
duce JoCaml constructs, their syntax, typing, and informal semantics, first in 
a concurrent but local setting (Section 2), and finally in a distributed setting 
(Section 3). 

Starting from Objective Caml. High-level distributed programming mostly 
relies on scripting languages (Agent-Tcl [11], TACOMA [14], Telescript [29]). 
Such languages are often specialized for some specific task or architecture, and 
may offer poor performances for other operations, typically relegated to external 
calls in other languages. Besides, for the sake of flexibility, these languages don’t 
provide much structure, such as modules, interfaces, classes, and user-defined 
types, to develop and maintain complex software projects. 

JoCaml is based on Objective Caml (OCaml), a compiled, general purpose, 
high-level programming language, which combines functional, imperative and 
object-oriented programming styles. OCaml is a great language, with several 
features that are especially relevant for our purpose: 

— Programs are statically typed, in a reasonably expressive type system with 
parametric polymorphism, type inference, subtyping for objects, user-defined 
data-types, and a rich module system. 

This is especially important in distributed systems, where there are many op- 
portunities to assemble inconsistent pieces of software, and where debugging 
runtime type errors is problematic. 

— Asa programming environment, OCaml provides both native-code and byte- 
code compilers, with separate compilation and flexible linking. The latter 
compiler is important to implement code mobility at runtime. 

— The OCaml runtime has good support for system programming, such as the 
ability to marshal and unmarshal any data types, even between heteroge- 
neous platforms; an efficient garbage collector, which we have extended for 
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distributed collection; and low-level access to the system, with many Unix 
system calls (POSIX threads, sockets, . . . ). 

Indeed, OCaml has been used to develop many complex programs for distributed 
systems, such as Web browsers with applets (MMM [26]), group communication 
libraries (Ensemble [12]), and Active Networks (Switch Ware [2]), and to exper- 
iment on a variety of parallel machines. 



Adding Concurrent Programming. OCaml is a sequential language: ex- 
pressions are evaluated in call-by- value, in a deterministic manner. As a first 
language extension, JoCaml provides support for lightweight concurrency, mes- 
sage passing, and message-based synchronization. 

To begin with, we introduce a new expression, spawn process ; expression, 
that executes process and evaluates expression in parallel. The respective op- 
erations in process and expression run independently; their interleaving is a 
first source of non-determinism. Also, process is not quite an expression — it is 
not meant to return a result — so we introduce a new syntactic class for (asyn- 
chronous) processes that is recursively defined with (synchronous) expressions. 

Processes can be seen as virtual threads, running in parallel, in no particular 
order. The JoCaml compiler and runtime are responsible for mapping these 
processes to a few system threads. 

Instead of producing values, processes interact by sending asynchronous mes- 
sages on channels.^ Indeed, an asynchronous message is itself a process. Accord- 
ingly, JoCaml also introduces channels and local channel definitions for processes, 
much like functions and let fun bindings for expressions: 

— Channels are first-class values, with a communication type, which can be 
used to form expressions and send messages (either as the message address 
or as message content). 

~ Channel definitions bind channel names, with a static scope, and associate 
guarded processes with these names. Whenever messages are passed on these 
names, copies of these processes are fired. 

So far, our extension still lacks expressiveness. We can generate concurrent 
computations but, conversely, there is no means of joining together the results of 
such computations or, for that matter, of having any kind of interaction between 
spawned processes. We need some synchronization primitives. 

A whole slew of stateful primitives have been proposed for encapsulating var- 
ious forms of inter-process interaction: concurrent variables, semaphores, mes- 
sage passing, futures, rendez-vous, monitors, . . . just to name a few. JoCaml 
distinguishes itself by using that basic staple of ML programming, definition 
by pattern-matching, to provide a declarative means for specifying inter-process 
synchronization, thus leaving state inside processes, where it rightfully belongs. 

® In addition to message passing, processes can still cause any OCaml side-effects, 
such as writing a shared mutable cell in a sub-expression; however, these effects are 
usually harder to trace than messages. 
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Concretely, this is done by allowing the joint definition of several chairnels by 
matching concurreirt message patterns oir these chairnels; in a nutshell, by al- 
lowing parallel composition on the left-hand-side of channel definitions. 

These synchronization patterns, first introduced in the join calculus, are 
equivalent to more traditional forms of message passing (with dynamic senders 
and receivers on every channel) in terms of expressiveness, but offer several ad- 
vantages from a programming language viewpoint. 

1. For each channel definition, all synchronization patterns are statically known, 
so they can be efficiently compiled using for instance state automata [17]. 

2. Similarly, type systems can analyze all contravariant occurrences of channels, 
then generalize their types (cf. Section 2.5). 

3. As regards distributed implementations, the static definition of channels 
(also known as locality in concurrency) enables an efficient implementation 
of routing: for a given channel, there exists a single definition that can han- 
dle the message, aird the machiire that hosts the definitioir is the only place 
where the message can participate to a synchroirous step. 

(Sectioir 2 will illustrate the use of join patterns, aird relate them to several other 
synchroirization primitives.) 

At this stage, we have a iratural extensioir of (impure) fuirctional program- 
miirg with lightweight concurrency and synchronization. Next, we explain how 
this language can be used across several machines on an asynchronous network, 
with distributed message passing and process mobility. 



Adding Distribution and Mobility. Before discussing any form of commu- 
nication between JoCaml programs, we refine our model and give an explicit 
account of locality. In particular, we must be able to represent several runtimes 
and their local processes on the same network. In the join calculus, the basic 
unit of locality is called a “location” . 

Locations have a nested structure, so that a given location can recursively 
contain sub-locations. This model is adequate to represent a network architec- 
ture, where OS processes are running inside a computer, computers are gathered 
inside LANs, themselves included inside Autonomous Systems, themselves orga- 
nized in a hierarchical model. 

In a JoCaml executable, the whole program is itself a location, called the 
root location, and implicitly managed by the runtime. Additional locations can 
be explicitly created, within an existing location, using a special declaration 
that describes the content of the new location (running processes, channels, 
even sub-locations) and gives it a fresh location name. Hence, a distributed 
configuration of machines running JoCaml programs can be seen as a location 
tree, each location hosting its own definitions and processes. 

As a first approximation, locations are transparent: channels have a global 
lexical scope, so that any process that has received a channel name can use it 
to send messages, and can forward it to other processes, independently of the 
location that defines the name. Said otherwise, from any given configuration. 
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we could in principle erase all location boundaries and obtain a single “global 
location” that contains all definitions and processes, and we would still get the 
same communications. 

In addition, locations can be used to control locality. Specifically, locations 
and location names have multiple roles in JoCaml programs: 

— Locations represent units of mobility. At any time, a location and its live 
content can decide to migrate from its current machine to another one; this 
can be used to model mobile objects, threads, applets, agents . . . Conversely, 
location names can be passed in messages, then used as “target addresses” for 
such migrations. This is especially convenient to relocate parts of a program 
as part of the computation. 

— Locations represent atomic units of failure, and can be used as targets for 
failure detection. It is possible to halt a location, atomically stopping the 
execution of all the processes and sub-locations included in the location. This 
can be used to discards parts of the computation without restarting the whole 
configuration. Besides, low-level system failures can be cleanly reflected in 
terms of “spontaneous” halts of root locations. Conversely, location names 
can also be used to detect the failure of remote locations, and trigger some 
failure recovery code. 



Further reading. The latest JoCaml release contains a reference manual, a 
series of sample programs, and a more extensive tutorial [15]. Some aspects of the 
implementation are described elsewhere in more details: the distributed garbage 
collector [18, 16]; the type system [9]; the compilation of join patterns [17]. 

A gentle introduction to the more formal aspects of the join calculus can be 
found in [8] , which further discusses its relation to functional programming, sur- 
veys its equational theory and proof techniques, and gives operational semantics 
for concurrency, distribution, and mobility. 

2 Concurrent Programming 

This section introduces the concurrent and asynchronous aspects of JoCaml, 
using a series of programming examples. Section 3 will deal with the distributed 
and mobile aspects. We assume some familiarity with functional programming 
languages, and in particular Objective Caml. 



Notations for Programs. The JoCaml top-level provides an interactive envi- 
ronment, much as the OCaml top-level. Programs consist of a series of top-level 
statements, terminated by an optional “ ; ; ” that triggers evaluation in interac- 
tive mode. Accordingly, our examples are given as JoCaml statements, followed 
by the output of the top-level. For instance: 

# let X = 1 ; ; 

val X : int 
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# print_int (x+1) ; ; 

2 

In order to experiment with the examples, you can either type them in a 
top-level, launched by the command joctop, or concatenate program fragments 
in a source file a. ml, compile it with the command joc -i a. ml (-1 enables 
the output of inferred types), and finally run the program with the command 
./a. out, as performed by the scripts that generate these proceedings. 

2.1 Expressions and Processes 

JoCaml programs are made of expressions and processes. Expressions are eval- 
uated synchronously, as usual in functional languages. Indeed, every OCaml 
expression is also a JoCaml expression. 

Processes are executed asynchronously and do not produce any result value, 
but they can communicate by sending messages on channels (a.k.a. port names). 
Messages carried by channels are made of zero or more values, which may in turn 
contain channels. 

Simple Channel Declarations. Channels, or port names, are the main new 
primitive values of JoCaml. Port names are either asynchronous or synchronous, 
depending on their usage for communications: an asynchronous channel is used 
to send a message; a synchronous channel is used to send a message and wait 
for an answer. 

Channels are introduced using a new let def binding, which should not 
be confused with the ordinary value let binding. The right hand-side of the 
definition of a channel is the process spawned for every message sent on that 
channel, after substituting the content of the message for the formal parameters 
on the left hand-side: in short, channels are process abstractions. 

For instance, we can define an asynchronous echo channel as follows: 

# let def echo! x = print_int x; ;; 
val echo : <<int>> 

The new channel echo has type <<int>>, the type of asynchronous channels 
carrying values of type int. Sending an integer i on echo fires an instance of the 
guarded process print_int i; which prints the integer on the console. Since 
echo is asynchronous, the sender does not know when the actual printing takes 
place. Syntactically, the presence of I in the definition of the channel indicates 
that this channel is asynchronous. This indication is present only in the channel 
definition, not when the channel is used. Also, on the right hand-side, print_iiit 
i is an expression that returns the unit value ( ) , so it is necessary to append 
a to obtain a process that discards this value. 

The definition of a synchronous print channel is as follows: 

# let def print x = print_int x; reply ;; 
val print : int -> unit 
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The new channel print has type int -> unit, the functional type that takes 
an integer and returns a void result. However, print is introduced by let def 
binding (with no ! ) , so it is a synchronous channel, and its process on the right 
hand-side must explicitly send back some (here zero) values as results using a 
reply process. This is an important difference with functions, which implicitly 
return the value of their main body. Message sending on print is synchronous, 
in the sense that the sender knows that console output has occurred when print 
returns (). 

Message sending on synchronous channels occurs in expressions, as if they 
were functions, whereas message sending on asynchronous channels occurs in 
processes. (The type-checker flags an error whenever a channel is used in the 
wrong context.) In contrast with value bindings in OCaml, channel definitions 
always have recursive scopes. 

In contrast with traditional channels in process calculi such as CCS, CSP [13], 
or the pi calculus [22, 21], channels and the processes that receive messages on 
those channels are statically defined in a single let def language construct. As 
a result, channels and functions become quite similar. 

Processes. Processes are the new core syntactic class of JoCaml. The most basic 
process sends a message on an asynchronous channel, such as the channel echo 
defined above. Since only declarations and expressions are allowed at top-level, 
processes are turned into expressions by “spawning” them : they are introduced 
by the keyword spawn followed by a process in braces “{ 

# spawn { echo 1 } ; ; 

# spawn { echo 2 } ; ; 

12 

Spawned processes run concurrently. The program above may echo 1 and 2 
in any order, so the output above may be 12 or 21, depending on the imple- 
mentation. Concurrent execution may also occur within a process built using 
the parallel composition operator “I”. For instance, an equivalent, more concise 
alternative to the example above is 

# spawn { echo 1 I echo 2 } ; ; 

21 

Composite processes also include conditionals (if then else), functional 
matching (match with) and local bindings (let in and let def in). Process 
grouping is done by using braces “{ }” . For instance, the top-level statement 

# spawn { let x = 1 in 

# { let y = x+1 in echo y I echo (y+1) } I echo x } ; ; 

=> 132 

may output 1, 2, and 3 in any order. Grouping around the process let y = x+1 
in . . . restricts the scope of y, so that echo x can run independently of the 
evaluation of y. 



136 



C. Fournet 



Expressions. As usual, expressions run sequentially (in call-by-value) and, 
when they converge, produce some values. They can occur at top-level, on the 
right-hand side of value bindings, and as arguments to message sending. Expres- 
sion grouping is done by using parentheses “ ( )” . Apart from OCaml expressions, 
the most basic expression sends values on a synchronous channel and waits for 
some reply: 

# let X = 1 in print x ; print (x+1) ; ; 

==> 12 

Both expressions print x and print (x+1) evaluate to the empty result (), 
which can be used for synchronization: the program above always outputs 12. 

Sequences may also occur inside processes. The general form of a sequence 
inside a process is expression ; process, where the result of expression will be 
discarded. As expression can itself be a sequence, we can write for instance spawn 
{ print 1 ; print 2 ; echo 3 }. 

Channels as Values. Channel names are first-class values in JoCaml, which 
can be locally created, then sent and received in messages. (From a concurrency 
viewpoint, this is often called name mobility [21], and this provides much of the 
expressiveness to the pi calculus and the join calculus.) 

In particular, we can write higher order functions and ports, such as 

# let async f = let def a! x = f x; in a 

# let filter f ch = let def fch! x = ch (f x) in fch 

# let def multicast clients = 

# let def mch! x = 

# let cast client = spawnfclient x} in 

# { List. iter cast clients; } in 

# reply mch ; ; 

val async : (’a -> ’b) -> <<’a>> 

val filter : (’c -> ’d) -> <<’d>> -> <<’c>> 

val multicast : <<’e>> list -> <<’e>> 

async turns a synchronous channel (or a function) into an asynchronous channel, 
by discarding its result; filter f ch creates a channel that applies f to every 
received message then forwards the result on ch; multicast clients creates a 
channel that forwards messages to all client channels. 

The types for these names and channels are polymorphic: they include type 
variables such as ’a that can be replaced with any type. In the example below, 
for instance, ’a is instantiated to string. (" is OCaml string concatenation.) 

# let echo_string = async print_string 

# let tell n = filter (fun x -> x~", "~n~"\n") echo_string 

# let yell = multicast [ tell "Cedric"; tell "Fabrice" ] 

# ;; 

# spawn { yell "Ciao" I yell "Hi" } ; ; 
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val echo_string : <<string>> 
val tell : string -> <<string>> 
val yell : <<string» 

Hi, Cedric 
Hi, Fabrice 
Ciao, Cedric 
Ciao, Fabrice 



2.2 Synchronization by Pattern Matching 

Join patterns extend port name definitions with synchronization. A pattern de- 
fines several ports simultaneously and specifies a synchronization condition to 
receive messages on these ports. For instance, the following statement defines 
two synchronizing port names fruit and cake: 

# let def fruit! f I cake I c = print_string (f~" "~c""\n"); ;; 

val cake : <<string» 

val fruit : <<string>> 

To trigger the guarded process print_string (f~" "~c""\n") messages must 
be sent on both channels fruit and cake. 

# spawn { fruit "apple" I cake "pie" } ; ; 

apple pie 

The parallel composition operator “ I ” appears both in join-patterns and in 
processes. This highlights the message combinations consumed by the pattern. 
The same pattern may be used many times, as long as there are enough messages 
to consume. When several matches are possible, which messages are consumed 
is left to the implementation. 

# spawn { fruit "apple" I fruit "raspberry" I 

# cake "pie" I cake "crumble" I cake "jelly" } ; ; 

raspberry pie 

apple crumble 

Composite join-definitions can also specify several synchronization patterns 
for the same defined channels. 

# let def 

# teal 0 I coin! () = print_string "Here is your tea\n" ; 

# or coffee! () I coin! () = print_string "Here is your coffee\n"; 

# ;; 

# spawn { teaO I coffeeO I coinO } ;; 

val coin : <<unit>> 
val tea : <<unit>> 
val coffee : <<unit» 

Here is your coffee 
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The name coin is defined only once, but can take part in two synchronization 
patterns. This co-definition is expressed by the keyword or. As illnstrated here, 
there may be some internal choice between several possible matches for the same 
current messages. 

Join-patterns are the programming paradigm for concurrency in JoCaml. 
They allow the encoding of many concurrent data structures. For instance, the 
following code defines a counter: 

# let def count! n I inc () = count (n+1) I reply to inc 

# or count I n I get () = count n I reply n to get ; ; 

# 

# spawn {count 0} ; ; 

val inc : unit -> unit 
val count : <<int>> 
val get : unit -> int 

This definition calls for two remarks. First, join-pattern may mix synchronous 
and asynchronous message, but when there are several synchronous message, 
each reply construct must specify the name to which it replies, using the new 
reply. . .to name construct. When there is a single synchronous name in the 
pattern, as in the example above, the to construct is optional. 

Second, the usage of name count is a typical way of ensuring mutual ex- 
clusion. For the moment, assume that there is at most one active invocation 
on count. When one invocation is active, count holds the counter value as a 
message and the counter is ready to be incremented or examined. Otherwise, 
some operation is being performed on the counter and pending operations are 
postponed until the operation being performed has left the counter in a consis- 
tent state. As a consequence, the counter may be used consistently by several 
threads. 

# let def wait I t = 

# if get()<3 then wait (t+1) else {print_int t;} in 

# spawn { wait 0 I {incO; incO;} I incO; } ;; 

=!> 1 

Ensuring the correct counter behavior in the example above requires some 
programming discipline: only one initial invocation on count has to be made. 
If there are more than one simultaneous invocations on count, then mutual 
exclusion is lost. If there is no initial invocation on count, then the counter is 
deadlocked. This can be prevented by making the count, inc and get names 
local to a new_counter definition and then exporting inc and get while hiding 
count, inside the internal lexical scope of the definition: 

# let def new_counter () = 

# let def count I n I incO 0 = count (n+1) I reply 

# or count I n I getO 0 = count n I reply n in 

# count 0 I reply incO , getO ; ; 

# let inc, get = new_counter () ;; 
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val new_counter : unit -> (unit -> unit) * (unit -> int) 
val inc : unit -> unit 
val get : unit -> int 

This programming style is reminiscent of imperative “object-oriented” program- 
ming: a counter is a thing called an object, it has some internal state (count 
and its argument), and it exports some methods to the external world (here, 
inc and get). The constructor new_counter creates a new object, initializes its 
internal state, and returns the public methods. Then, a program may allocate 
and use several counters independently. 

2.3 Concurrency Control 

Join-pattern synchronization can express many common programming styles, 
either concurrent or sequential. Next, we give basic examples of abstractions for 
concurrency. 

Synchronization Barriers. A barrier is a common synchronization mecha- 
nism. Basically, barriers represent explicit synchronization points, also known as 
rendez-vous, in the execution of parallel tasks. 

# let def syncl () I sync2 () = reply to syncl I reply to sync2 ; ; 
val sync2 : unit -> unit 

val syncl : unit -> unit 

The definition includes two reply constructs, which makes the mention of a 
port mandatory. The example below illustrates how the barrier can be used to 
constrain the interleaving of concurrent tasks. The possible outputs are given by 
the regular expression {12|21}*. 

# spawn { for i = 0 to 9 do synclO; print_int 1 done; }; 

# spawn { for i = 0 to 9 do sync2(); print_int 2 done; } ;; 

12121212121212121212 



Fork/Join Parallelism. Our next example is similar but more general. Con- 
sider the sequential function let evalSeq (f ,g) t = (f t, g t). We define 
define a variant, evalPar, that performs the two computations f t and g t in 
parallel, then joins the two results. We use local channels cf and eg to collect 
the results, we spawn an extra process for cf (ft), and evaluate g t in the 
main body of evalPar: 

# let evalPar (f,g) t = 

# let def cf ! u I eg v = reply (u,v) in 

# spawn { cf (f t) }; eg (g t) ;; 

# 

# let xycoord = evalPar (cos, sin) ;; 

val evalPar : (’a -> 'b) * (’a -> ’c) -> ’a -> ’b * ’c 
val xycoord : float -> float * float 
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Bi-directional Channels. Bi-directional channels appear in most process cal- 
culi, and in programming languages such as PICT [24] and CML [25]. In the 
asynchronous pi calculus, for instance, and for a given channel c, a value v can 
be sent asynchronously on c (written cl [?;] ) or received from c and bound to 
some variable x in some guarded process P (written c?x.P). Any process can 
send and receive on the channels they know. Finally, the scope of a pi calculus 
channel name c is defined by the “new” binding vc.P. In JoCaml, a process 
can only send messages whereas, for a given name, a unique definition binds the 
name and receives messages on that name. Nonetheless, bi-directional channels 
can be defined as follows: 

# type ’a pi_channel = { snd : <<’a>> ; rev : unit -> ’a } 

# let def new_pi_channel () = 

# let def send! x I receive () = reply x in 

# reply ■[snd=send; rcv=receive} ; ; 

type ’a pi_channel = { snd: «’a>>; rev: unit -> ’a } 
val new_pi_channel : unit -> ’b pi_channel 

A pi calculus channel is implemented by a join definition with two port names. 
The port name send is asynchronous and is used to send messages on the channel. 
Such messages can be received by making a synchronous call to the other port 
name receive. Finally, the new pi calculus channel is packaged as a record of 
the two new JoCaml names. (Processes and OCaml records both use braces, but 
in different syntactic contexts.) 

Let us now “translate” the pi calculus process 

j/c, d.( c![l] I c![5] I cl{x).d\[x + x] \ d?{y). print _int{y) ) 

We obtain a similar (but more verbose) process: 

# spawn { 

# let c,d = new_pi_channel() ,new_pi_channel() in 

# c . snd lie. snd 5 I 

# {let X = c.rcvO in d.snd (x+x)} I 

# {let y = d.rcvO in print_int y ;} } ;; 

^ 2 

Synchronous pi calculus channels are encoded just as easily as asynchronous 
ones: it suffices to make send synchronous: 

# type ’a pi_sync_chctnnel = { snd : ’a -> unit; rev: unit -> ’a } 

# let def new_pi_sync_channel () = 

# let def send x I receive () = 

# reply x to receive I reply to send in 

# reply {snd=send; rcv=receive} ;; 

type ’a pi_sync_channel = { snd: ’a -> unit; rev: unit -> ’a } 
val new_pi_sync_channel : unit -> ’b pi_sync_channel 



JoCaml: a Language for Concurrent Distributed and Mobile Programming 141 



2.4 Concurrent Data structures 

We continue our exploration of message passing in JoCaml, and now consider 
some concurrent data structures. (In practice, one would often use the built-in 
data structures inherited from OCaml rather than their JoCaml internal encod- 
ings.) 



A Reference Cell. Mutable data structures can be encoded using internal 
messages that carry the state of the object. A basic example is the imperative 
variable, also known as reference cell. One method (get) examines the content 
of the cell, while another (set) alters it. 

# type ’a jref = { set: ’a -> unit; get: unit -> ’a } 

# 

# let def new_ref u = 

# let def state! v I get () = state v I reply v 

# or state! v I set w = state w I reply in 

# state u I reply ■[get=get; set=set} 

# 

# let rO = new_ref 0 ; ; 

type ’a jref = { set: ’a -> unit; get: unit -> ’a } 
val new_ref : ’b -> ’b jref 
val rO : int jref 

Here, the internal state of a cell is its content, its is stored as a message v on 
channel state. Lexical scoping is used to keep the state internal to a given cell. 

Also, note that the type ’a jref and ’a pi_sync_channel are isomor- 
phic; indeed, objects such as mutable references, bi-directional channels, n-place 
buffers, queues, . . . may have the same method interface and implement diverse 
concurrent behaviors. 



A coucurreut FIFO. Our second example is more involved. A concurrent 
FIFO queue is a data structure that provides two methods put and get to 
add and retrieve elements from the queue. Unlike a functional queue, however, 
getting from an empty queue blocks until an element is added, instead of raising 
an exception. 

We give below an implementation that relies (as usual) on two internal lists 
to store the current values in the queue, but also supports concurrent gets and 
puts operations. We use local asynchronous messages to represent the state of 
the lists, with different messages for empty lists (inN, outN) and non-empty lists 
(inQ, outQ). 

— Requests on put are always processed at once, using one of the first two 
patterns, according to the state of the input list. 

— Requests on get proceed if the output list is non-empty (third pattern) — the 
auxiliary outX channel then returns the head value and updates the state 
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of the output list, get requests can also proceed if the output list is empty 
and the input list is non-empty. To this end, the input list is reversed and 
transferred to the output list. 

— There is no pattern for get when both lists are empty, so get requests are 
implicitly blocked in this case. 

— Initially, both lists are empty. 

The queue is polymorphic, but its usage is briefly illustrated using integers and 
series of concurrent puts and gets. 

# type ’a buffer = { get : unit -> ’a ; put : ’a -> unit } 

# let def new_fifo () = 

# let def 

# put i I inN!() = inQ [i] I reply 

# or put i I inQ! is = inQ (i::is) I reply 

# or getO I outQI os = reply outX os 

# or getO I outNiO I inQ I is = 

# inN 0 I reply outX (List. rev is) 

# or outX os = 

# reply List.hd os I let os' = List.tl os in 

# { if os’ = [] then outN() else outQ os’ } 

# in 

# inN() I outNO I reply {get=get ;put=put} ;; 

# 

# let f = new_fifo() in 

# spawn { for i = 1 to 9 do f.put i done; 

# spawn { for i = 1 to 5 do print_int (f.getO) done; } ;; 
type ’a buffer = { get: unit -> ’a; put: ’a -> unit } 

val new_fifo : unit -> ’b buffer 
12345 



2.5 Types and Exceptions 

A word on typing. The JoCaml type system is derived from ML and it should 
be no surprise to functional programmers. In particular, it extends parametric 
polymorphism to the typing of channels. We refer to [9] for a detailed discussion. 

Experienced ML programmers may wonder how the JoCaml type system 
achieves mixing parametric polymorphism and mutable data structures. There 
is no miracle here. Consider, again, the JoCaml encoding of a reference cell: 

# let def state! v I get () = state v I reply v 

# or state ! v I set w = state w I reply ; ; 
val get : unit -> ’_a 

val state : <<’_a>> 
val set : ’_a -> unit 
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The type variable ’_a that appears inside the types for state, get and set 
is prefixed by an underscore. Such variables are non-generalized type variables 
that can be instantiated only once. That is, all the occurrences of state must 
have the same type. Operationally, once ’_a is instantiated with some type, this 
type replaces ’ _a in any other types where it occurs (here, the types for get 
and set). This guarantees that the various port names whose type contains ’_a 
(state, get and set here) are used consistently. 

For instance, in the following program, state 0 and print_string(get () ) 
force two incompatible instantiations, which leads to a type-checking error (and 
actually avoids printing an integer while believing it is a string) . 

# let def state! v I get () = state v I reply v 

# or state! v I set w = state w I reply ;; 

# 

# spawn {state 0} ; print_string (getO) ;; 

File "ex26.ml", line 6, characters 32-37: 

This expression has type int but is here used with type string 

More generally, whenever the type of several co-defined port names share a type 
variable, this variable is not generalized. (In ML, the same limitation occurs 
in the types of identifiers defined by a value binding.) A workaround is to en- 
capsulate the definition into another one, which gives another opportunity to 
generalize type variables: 

# let def new_ref v = 

# let def state! v I get () = state v I reply v 

# or state! v I set w = state w I reply 

# in spawn {state v} ; reply (get, set) ;; 
val new_ref : ’a -> (unit -> ’a) * (’a -> unit) 



Exceptions. Exceptions and exception handling within expressions behave as 
in OCaml. If an exception is not caught in the current expression, however, its 
handling depends on the synchrony of the process. 

If the process is asynchronous, the exception is printed on the standard out- 
put and the asynchronous process terminates. No other process is affected. 

# spawn { failwith "Bye"; }; print_string "Done" ;; 

=> Uncaught exception : Failure ("Bye") 

Done 

If the process is synchronous, every joint call terminates with the exception 
instead of a reply. In particular, when a pattern contains several synchronous 
channels, the exception is replicated and thrown to all blocked callers: 

# let catch x = try x() with Failure s -> print_string s in 

# let def a 0 I b () = 

# failwith "Bye reply to a I reply to b in 
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# spawn { {catch a;} I {catch b;} } ;; 

Bye Bye 

Exercise 1. The “core join calculus” consists only of asynchronous channels and 
processes. Sketch an encoding of synchronous channels and expressions into this 
subset of JoCaml. (Hint: this essentially amounts to a call-by-value continuation- 
passing encoding.) 

Exercise 2 (Fairness). What kind of fairness is actually provided by JoCaml 
when several messages are available on the same channel? When different pat- 
terns could be used for the same messages? Try to define stronger fairness prop- 
erties and to implement them for some examples of join patterns. 

3 Distributed Programming 

JoCaml has been designed to provide a simple and well-defined model of dis- 
tributed programming. Since the language entirely relies on asynchronous mes- 
sage passing, programs can either be used on a single machine (as described in 
the previous section), or they can be executed in a distributed manner on several 
machines. 

In this section, we give a more explicit account of distribution. We describe 
support for execution on several machines and new primitives that control local- 
ity, migration, and failure. To this end, we interleave a description of the model 
with a series of examples that illustrate the use of these primitives. 

3.1 The Distributed Model 

The execution of JoCaml programs can be distributed among several machines, 
possibly running different systems; new machines may join or quit the compu- 
tation. At any time, every process or expression is running on a given machine. 
However, they may migrate from one machine to another, under the control of 
the language. In this implementation, the runtime support consists of several 
system-level processes that communicate using TCP/IP over the network. 

In JoCaml, the execution of a process (or an expression) does not usually 
depend on its localization. Indeed, it is equivalent to run processes P and Q 
on two different machines, or to run the compound process { P I Q } on a 
single machine. In particular, the scope for defined names and values does not 
depend on their localization: whenever a port name appears in a process, it 
can be used to form messages (using the name as the address, or as the message 
content) without knowing whether this port name is locally- or remotely-defined, 
and which machine will actually handle the message. As a first approximation, 
locality is transparent, and programs can be written independently of their run- 
time distribution. 

Of course, locality matters in some circumstances: side-effects such as printing 
values on the local console depend on the current machine; besides, efficiency 
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can be affected because message sending over the network takes much longer 
than local calls; finally, the termination of some underlying runtime will affect 
all its local processes. For these reasons, locality is explicitly controlled by the 
programmer, and can be adjusted using migration. Conversely, resources such 
as definitions and processes are never silently relocated by the system — the pro- 
grammer interested in dynamic load-balancing must code relocation as part of 
his application. 

An important issue when passing messages in a distributed system is whether 
the message content is copied or passed by reference. This is the essential differ- 
ence between functions and synchronous channels. 

~ When a function is sent to a remote machine, a copy of its code and values 
for its local variables are also sent there. Afterwards, any invocation will be 
executed locally on the remote machine. 

— When a synchronous port name is sent to a remote machine, only the name 
is sent (with adequate routing information) and invocations on this name 
will forward the invocation to the machine where the name is defined, much 
as in a remote procedure call. 



The name-server. Since JoCaml has lexical scoping, programs being executed 
on different runtimes do not initially share any port name; therefore, they would 
normally not be able to interact with one another. To bootstrap a distributed 
computation, it is necessary to exchange a few names, and this is achieved using 
a built-in library called the name server. Once this is done, these first names can 
be used to communicate some more values (and in particular port names) and 
to build more complex communication patterns. 

The interface of the name server mostly consists of two functions to regis- 
ter and look up arbitrary values in a “global table” indexed by plain strings. 
Pragmatically, when a JoCaml program (or top-level) is started, it takes as pa- 
rameters the IP address and port number of a name server. The name server 
itself can be launched using the command jocns. 

The following program illustrates the use of the name server, with two pro- 
cesses running in parallel (although still in the same runtime). One of them 
locally defines some resource (a function f that squares integers) and registers it 
under the string square. The other process is not within the scope of f; it looks 
up for the value registered under the same string, locally binds it to sqr, then 
uses it to print something. 

# spawnf let def f x = reply x*x 

# in Ns. register "square" f vartype; 

# 

# spawnf let sqr = Ns. lookup "square" vartype 

# in print_int (sqr 2); 

Warning: VARTYPE replaced by type ( int -> int) metatype 
Warning: VARTYPE replaced by type ( int -> int) metatype 
4 
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The vartype keyword stands for the (runtime representation of the) type of 
the value that is being registered or looked up, which is automatically inserted 
by the compiler. When a value is registered, its type is explicitly stored with it. 
When a value is looked up, the stored type is compared with the inferred type in 
the receiving context; if these types do not match, an exception TypeMismatch 
is raised. This limited form of dynamic typing is necessary to ensure type safety. 
To prevent (and explain) runtime TypeMismatch exceptions, the compiler also 
issues a warning that provides the inferred vartype at both ends of the name 
server, here int -> int. (When writing distributed program fragments, it is 
usually a good idea to share the type declarations in a single .mli file and to 
explicitly write these types when calling the name server.) 

Of course, using the name server makes sense only when the two processes 
are running as part of stand-alone programs on different machines, and when 
these processes use the same conventional strings to access the name server. To 
avoid name clashes when using the same name server for unrelated computations, 
the indexed string is prefixed by a local identifier Ns. user; by default. Ns. user 
contains the local user name. 



3.2 Locations and Mobility 

So far, the localization of processes and expressions is entirely static. In some 
cases, a more flexible control is called for. Assume that “square computations” 
are best performed only on the server machine that exports the square port, 
and that a client machine needs to compute sums of squares. If the client uses a 
loop to compute the sum by remote calls on square, each call within the loop 
would result in two messages on the network (one for the request, and another 
one for the answer). It would be better to run the loop on the machine that 
actually computes the squares. Yet, we would prefer not to modify the program 
running on the server every time we need to run a different kind of loop that 
involves numerous squares. 

To this end, we introduce a unit of locality called “location”. A location 
contains a bunch of definitions and running processes “at the same place” . Every 
location is given a name, and these location names are first-class values. They can 
be communicated as content of messages and registered to the name server. These 
location names can also be used as arguments to primitives that dynamically 
control the relations between locations. 



Basic examples. Locations can be declared either locally or as a top-level 
statement. For instance, we create a new location named here: 

# let loc here 

# def square x = reply x*x 

# and cubic x = reply (square x)*x 

# do { print_int (square 2) ; } 

# ;; 
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# print_int (cubic 2) 
val here : Join. location 
val cubic : int -> int 
val square : int -> int 

48 

This let loc declaration binds a location name here and two port names square 
and cubic whose scope extends to the location and to the following statements. 
Here, the location also has an initial process print_int (square 2) ; intro- 
duced by do {} (much like spawn {} in expressions). This process runs within 
the location, in parallel with the remaining part of the program. As a result, we 
can obtain either 84 or 48. 

Distributed computations are organized as trees of nested locations; every 
definition and every process is permanently attached to the location where it 
appears in the source program. Since let Iocs can occur under guards, processes 
and expressions can create new locations at runtime, with their initial content 
(bindings and processes) and a fresh location name. The new location is placed 
as a sub-location of the location that encloses the let loc. Once created, there 
is no way to add new bindings and processes to the location from outside the 
location. 

For instance, the following program defines three locations such that the lo- 
cations named kitchen and living_room are sub-locations of house. As regards 
the scopes of names, the locations kitchen, living_room and the ports cook, 
switch, on, off all have the same scope, which extends to the whole house 
location (between the first do { and the last }). Only the location name house 
is visible from the rest of the source file. 

# let loc house do { 

# let loc kitchen 

# def cookO = print_string " Cooking... reply 

# do {> 

# and living_room 

# def switchO I off!() = print_string "Music on."; reply I on() 

# or switchO I on!() = print_string "Music off."; reply I offO 

# do { offO } 

# in 

# switchO; cookO ; switchO; } 
val house ; Join. location 

Music on. Cooking. . . Music off. 



Mobile Agents. While processes and definitions are statically attached to 
their location, locations can move from one enclosing location to another. Such 
migrations are triggered by a process inside of the moving location (a “subjective 
move”, in Cardelli’s terminology [5]). As a result of the migration, the moving 
location becomes a sub-location of its target location. Note that locations can 



148 



C. Fournet 



be used for several purposes: as destination addresses, as mobile agents, or as a 
combination of the two. 

Our next example is an agent-based program to compute a sum of squares. 
On the server side, we create an empty location, here, and we register it on the 
name-server; its name will be used as the target address for our mobile agent. 

# let loc here do in Ns. register "here" here vartype 

# ;; 

# Join. server 0 

(The call to Join, server () prevents the immediate termination of the JoCaml 
runtime, even if it has no active process or expression: further local activity can 
occur later, as the result of remote messages and migrations.) 

On the client side, we create another location, mobile, that wraps the loop 
computation that should be executed on the square side; the process within 
mobile first looks up the name here, then moves itself inside of “here”, and 
finally performs the computation. 

# let loc mobile 

# do { 

# let there = Ns. lookup "here" vartype in 

# go there; 

# let sqr = Ns. lookup "square" vartype in 

# let def sum (s,n) = 

# reply (if n = 0 then s else sum (s+sqr n, n-1)) in 

# print_string (sum(0,5)); 

# } 

The go there expression migrates the mobile location with its current con- 
tent to the server machine, as a sub-location of location here, then completes 
and returns () . Afterwards, the whole computation (calls to the name server, to 
sqr and to sum) is local to the server. There are only three messages exchanged 
between the machines: one for Ns. lookup, one for the answer, and one for the 
migration. 

Applets. The next example shows how to define applets. An applet is a pro- 
gram that is downloaded from a remote server, then used locally. As compared 
to the previous examples, migration operates the other way round, from the 
server to the client. For our purposes, the applet implements a mutable cell with 
destructive reading: 

# let def new_cell there = 

# let def log s = print_string ("cell "~s~"\n"); reply in 

# let loc applet 

# def getO I some! x = log ("is empty"); noneO I reply x 

# and put X I nonelO = log ("contains "~x); some x I reply 

# do { go there; none () } in 
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# reply get, put 

# ;; 

# Ns. register "cell" new_cell vartype; 

# Join. server 

Our applet has two states: either noneOor some s where s is a string, and 
two methods get and put. Each time cell is called, it creates a new applet in 
its own location. Thus, numerous independent cells can be created and shipped 
to callers. The name cell takes as argument the location (there) where the 
new cell should reside. The relocation is controlled by the process go there; 
none () that first performs the migration, then sends an internal message to 
activate the cell. Besides, cell defines a log function outside of the applet. The 
latter therefore remains on the server and, when called from within the applet 
on the client machine, keeps track of the usage of its cell. This is in contrast 
with applets a la Java: the location migrates with its code, but also with its 
communication capabilities unaffected. 

We supplement our example with a basic user that allocates and uses a local 
cell: 

# let cell = Ns. lookup "cell" vartype 

# 

# let loc user 

# do { 

# let get, (put : string -> unit) = cell user in 

# put "world" ; 

# put ("Hello, "'get ()); 

# print_string (get ()); 

# } 

On the client machine, we observe “Hello, world” on the console, as could be 
expected. Besides, on the server side, we observe the log: 

cell is empty 
cell contains world 
=> cell is empty 
=> cell contains Hello, world 
cell is empty 

On the client machine, there are no more go primitives in the applet after its 
arrival, and this instance of the location name applet does not appear anywhere. 
As a result, the contents of the applet can be considered part of the host location, 
as if this content had been defined locally in the beginning. (Some other host 
location may still move, but then it would carry the cell applet as a sub-location.) 

Exercise 3 (Local State). What is the experimental distributed semantics of mu- 
table references? What about global references and modules? Write a function 
that allocates a “correct” distributed reference with an interface for reading, 
writing, and relocating the reference. 
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3.3 Termination, Failures, and Failure Recovery 

As a matter of fact, some parts of a distributed computation may fail (e.g., 
because a machine is abruptly switched off). The simplest solution would be to 
abort the whole computation whenever this is detected, but this is not realistic 
in case numerous machines are involved. Rather, we would like our programs 
to detect such failures and take adequate measures, such as cleanly report the 
problem, abort related parts of the computation, or make another attempt on 
a different machine. To this end, JoCaml provides an abstract model of failure 
and failure detection expressed in terms of locations: 

~ a location can run a primitive process haltO that, when executed, atom- 
ically halts every process inside of this location (and recursively every sub- 
location) ; 

~ a location can detect that another location with name there has halted, us- 
ing a primitive expression of the form fail there ;P. The expression blocks, 
until the failure of location there is detected. When the process P runs, it 
is guaranteed that location there is halted for any other location trying to 
access there. 

The halt primitive can be seen as a way to reflect, in the model, the abrupt 
failure of a machine that hosts the halted locations. For instance, a fallible 
machine running a process P can be seen as a top-level location 

let loc runtime do { P I haltO } 

Since locations fail only as a whole, the programmer can define locations 
as suitable units of failure recovery, pass their names to set up remote failure 
detection, and even use halt and fail primitives to control the computation. 
By design, however, no silent recovery mechanism is provided: the programmer 
must figure out what to do in case of partial failure. 

The fail primitive is tricky to implement (it cannot be fully implemented on 
top of an asynchronous network, for instance). On the other hand, it does provide 
the expected negative guarantees: the failed location is not visible anymore, from 
any part of the computation, on any machine. In the current implementation, 
halting is detected only when (1) the halt () primitive is issued in the same 
runtime as the fail, or (2) the JoCaml runtime containing the location actually 
stops. (Thus, simply running halt () does not trigger matching fails in other 
runtimes, but exit 0; will trigger them.) 

A Computation Supervisor. There is usually no need to halt locations that 
completed their task explicitly (the garbage-collector should take care of them) . 
However, in some case we would like to be sure that no immigrant location is 
still running locally. 

Let us assume that j ob is a remote function within location there that may 
create mobile sub-locations and migrate them to the caller’s site. To this end, 
the caller should supply a host location, as in the previous examples. How can 
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we make sure that job is not using this location to run other agents after the 
call completes ? This is handled using a new temporary location box for each 
call, and halting it once the function call has completed. 

# let def safe! (job,arg, success, failure) = 

# let loc box 

# def kill! () = haltO ; 

# and startO = reply job (box,arg) in 

# let 

# def got! X I live!() = got x I killO 

# or got! X I halted! 0 = success x 

# or live!() I halted! () = failure () in 

# got (startO) I liveO I fail box; haltedO 
val safe : 

<<( (Join, location * ’a -> ’b) * ’a * <<’b» * «unit>>)» 

Our supervising protocol either send a result on success or a signal on 
failure. In both cases, the message guarantees that no alien computation may 
take place afterward on the local machine. 

The protocol consists of a host location and a supervisor definition. Initially, 
there is a liveO message and the supervisor waits for either a result on got or 
some failure report on halted. Depending on the definition of job, the expression 
job(box,arg) can create and move locations inside of the box, communicate 
with the outside, and eventually reply some value within the box. Once this 
occurs, got forwards the reply to the control process, and the first join-pattern is 
triggered. In this case, the liveO message is consumed and eventually replaced 
by a haltedO message (once the killO message is handled, the box gets 
halted, and the fail guard in the control process is triggered, releasing a message 
on halted). 

At this stage, we know for sure that no binding or computation introduced 
by job remains on the caller’s machine, and we can return the value as if a plain 
RPC had occurred. 

This “wrapper” is quite general. Once a location-passing convention is chosen, 
the safe function does not depend on the actual computation performed by job 
(its arguments, its results, and even the way it uses locations are parametric 
here). We could further refine this example to transform unduly long calls to 
job into failure (by sending a kill () message after an external timeout), and 
to delegate some more control to the caller (by returning kill at once) . 

Exercise 4 (Mobility). Starting from your favorite functional program, add lo- 
cations and mobility control to distribute the program on several machines, and 
speed up the computation. 



152 



C. Fournet 



A Concurrent Programming (Lab Session) 

We suggest a series of exercices to experiment with asynchronous message pass- 
ing in JoCaml, including classic programming examples. We also provide some 
solutions. One may also begin with the examples in the previous sections, or 
even try to implement one’s favorite concurrent algorithm. 

Exercise 5 (Fibonacci). Assume we are computing the Fibonacci series on values 
with a slow (but parallel) addition, rather than integers. For example: 

# type slow = int 

# 

# let delay = ref 0.1 

# let add (a: slow) (b:slow) = 

# Thread. delay ! delay; (a+b : slow) 

# 

# let rec fib = function 

# I 0 -> 0 I 1 -> 1 I n -> add (fib (n-1)) (fib(n-2)) ;; 

type slow = int 

val delay : float ref 

val add : slow -> slow -> slow 

val fib : int -> slow 

Write a faster, parallel version of fib. What kind of speedup should we 
obtain? What is actually observed? Does that depend on delay? 

Exercise 6 (Locks). Write a JoCaml implementation of locks, with the following 
interfaces: 

1. basic lock, with a synchronous get channel to acquire the lock and an asyn- 
chronous release channel to release the lock. 

2. n-user lock, with the same interface but up to n concurrent holders for the 
lock. 

3. reader- writer locks, with interface 

— acquire_shared to get a non-exclusive lock (or block until available), 

— release_shared to release the non-exclusive lock, 

~ acquire_exclusive to get an exclusive lock (or block until available), 
~ release_exclusive to release it. 

4. reader- writer locks with fairness between writers and readers: provided all 
locks are eventually released, any acquire request is eventually granted. 

Solutions 

Fibonacci (Exercise 5). We can use fork/join parallelism, e.g. 

# let rec pfib = function 

# I 0 -> 0 I 1 -> 1 I n -> 

# let def a! v I b u = { reply (add u v) }■ in 

# spawn {a (pfib (n-2))}; b(pfib(n-l)) 
val pfib : int -> Exl.slow 
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We obtain (on a laptop running Windows XP): 



# let time f v = 

# let t = Unix . gettimeof day () in let r = f v in 

# let t’ = Unix. gettimeof day () in t ’ t in 

# let test size = 

# let tO,tl = time fib size, time pfib size in 

# Printf .printf 

# "delay=7,l . le size=7„2d base=7o4.2f fj=7o4.2f speedup=7.f" 

# ! delay size tO tl ((tO-. tl)/. tO) ; print_newline () in 

# delay := 0.001 ; test 12; 

# delay := ! delay /. 10.; test 16; 

# delay := ! delay /. 10.; test 17; 

# delay := ! delay /. 10.; test 18; 

# delay := ! delay /. 10.; test 19; 

# delay := ! delay /. 10.; test 19; 

# delay := ! delay /. 10.; test 20; 

delay=l . Oe-03 size=12 base=0 . 54 fj=0.02 speedup=0 . 953618 
delay=l . Oe-04 size=16 base=3.73 fj=0.08 speedup=0 . 978815 
delay=l . Oe-05 size=17 base=5.53 fj=0.13 speedup=0. 975786 
delay=l . Oe-06 size=18 base=4.61 fj=0.22 speedup=0. 951420 
delay=l . Oe-07 size=19 base=7.57 fj=0.45 speedup=0 . 940275 
delay=l . Oe-08 size=19 base=0. 19 fj=0.38 speedup=-l . 059140 
delay=l . Oe-09 size=20 base=0. 31 fj=0.91 speedup=-l .999999 



Locks (Exercise 6). 

# type lock = { acquire ; unit -> unit ; release : <<unit» } 

# 

# let new_lock() = 

# let def acquireO I release! () = reply in 

# spawn{ release 0 }; 

# {acquire=acquire ; release=release} ;; 

# 

# let new_nlock n = 

# let def acquireO I token! () = reply in 

# for i = 1 to n do spawn{ token () } done; 

# {acquire=acquire ; release=token} ;; 

# 

# let new_rwlock() = 

# let def 

# acquire_exclusive 0 I idle!() = reply 

# or acquire_shared() I idle!() = shared 1 I reply 

# or acquire_shared() I shared! n = shared (n+1) I reply 

# or release_shared! () I shared! n = 

# if n==l then idleO else shared (n-1) in 
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# spawn { idleO }; 

# {acquire=acquire_shared; release=release_shared} , 

# {acquire=acquire_exclusive ; release=idle} ;; 

# 

# let new_rwf airlockO = 

# let def 

# acquire_exclusive 0 I idle!() = reply 

# or acquire_shared() I idle!() = shared 1 I reply 

# or acquire_shared() I shared! n = shared (n+1) I reply 

# or release_shared! () I shared! n = 

# if n==l then idleO else shared (n-1) 

# 

# or acquire_exclusive() I shared! n = waiting n I reply waitO 

# or release_shared! () I waiting! n = 

# if n==l then readyO else waiting (n-1) 

# or waitO I ready! () = reply in 

# spawn ■[ idleO }; 

# {acquire=acquire_shared; release=release_shared} , 

# {acquire=acquire_exclusive ; release=idle} ;; 

type lock = { acquire: unit -> unit; release: <<unit>> } 

val new_lock : unit -> lock 

val new_nlock : int -> lock 

val new_rwlock : unit -> lock * lock 

val new_rwf airlock : unit -> lock * lock 

In all these examples, we rely on the caller to enforce the lock discipline: 
only release once, after acquiring the lock. We could also provide abstractions 
to enforce it, e.g. 

# let synchronized lock job v = 

# lock. acquire 0 ; let r = job v in spawn{ lock.releaseO }; r ;; 
val synchronized : Ex4.1ock -> (’a -> ’b) -> ’a -> ’b 



B Distributed and Mobile Programming (Lab Session) 

Exercise 7. Use the nameserver to exchange a simple string between two run- 
times: write a first program to register your name using the string ’’name”, and 
a second one to lookup the string ’’name”, and print it on the terminal. 

Exercise 8 (Remote Shell Command). 

1. Write a program that registers a synchronous channel on the nameserver. 
The channel takes a string as argument, calls Sys. command to execute it, 
and returns the error code for the command. 

Write a second program to lookup this channel and execute some commands 
remotely. 
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2. Write a program that registers a synchronous channel on the nameserver. 
The channel returns a new location that can be used to send an agent on 
the computer. 

Write a second program that sends an agent to the location, lists all files in 
the ”/tmp” directory (use Unix . opendir. Unix . readdir [raises an exception 
at end] and Unix . closedir), and returns the list on the caller machine. 

Exercise 9. Write a “chat” program with JoCaml: 

1. Write a channel of type: <<string * string>> — the first string is a user 
name, the second string is a message from that user. Register the channel on 
the name server under your name (use Ns. user := "pub" in your program 
for all users to be able to access your name). 

2. Write a function that sends a message to a friend: the function should lookup 
a channel on the nameserver from the friend name, then send a message on 
this channel. 

3. Add chat rooms to your program: 

~ Write a chat rooms server, that will manage all the chat rooms. 

— Write different programs: 

• To list all the existing chat rooms 

• To add a new chat room 

• To join a chat room: this client should be able to read the user input 
(use the function read_line for this), send the message to the chat 
room, and display messages received from other users of the chat 
room. 



Name Server (Exercise 7). First program: 
Ns. register "name" "My Name" vartype;; 
Second program: 

let name = Ns. lookup "name" vartype;; 
print_string name; print_newline ();; 



RSH (Exercise 8.1). First program: 

let def rsh(command) = reply (Sys. command command);; 
Ns. register "my computer name" rsh vartype;; 

Second program: 

let rsh = Ns. lookup "my computer nernie" vartype ;; 
print_int (rsh "xterm"); print_newline () ;; 
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RSH (Exercise 8.2). First program: 

let def rsh_loc () = 

let loc new_location do {} in 
reply new_location ; ; 

Ns. register "my computer name" rsh_loc vartype ;; 

Second program: 

let (rsh_loc : unit -> Join. location) = 

Ns. lookup "my computer name" vartype ;; 

let def print_list! list = 

{ List. iter (fun s -> print_string s; print_newline() ) list; };; 

let loc listdir_agent do 
{ Join. go (rsh_loc ()); 
let list = ref [] in 
(let dir = Unix.opendir "/tmp" in 
try 

while true do 

list := (Unix . readdir dir) :: Hist 
done 

with _ -> Unix . closedir dir); 
print_list Hist 

Chat (Exercise 9). A complete implementation can be found at 
http : //pauillac . inria.fr/jocaml/afp2002/ chat .ml. 



Acknowledgement.. Many thanks to the members of the Moscova project at 
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Abstract. Functional reactive programming, or FRP, is a paradigm for 
programming hybrid systems - i.e., systems containing a combination of 
both continuous and discrete components - in a high-level, declarative 
way. The key ideas in FRP are its notions of continuous, time-varying 
values, and time-ordered sequences of discrete events. 

Yampa is an instantiation of FRP as a domain-specific language embed- 
ded in Haskell. This paper describes Yampa in detail, and shows how 
it can be used to program a particular kind of hybrid system: a mobile 
robot. Because performance is critical in robotic programming, Yampa 
uses arrows (a generalization of monads) to create a disciplined style of 
programming with time-varying values that helps ensure that common 
kinds of time- and space-leaks do not occur. 

No previous experience with robots is expected of the reader, although a 
basic understanding of physics and calculus is assumed. No knowledge of 
arrows is required either, although we assume a good working knowledge 
of Haskell. 

This paper is dedicated in memory of 
Edsger W. Dijkstra 

for his influential insight that mathematieal logie is and 
must be the basis for sensible computer program construetion. 



1 Introduction 

Can functional languages be used in the real world, and in particular for real- 
time systems? More specifically, can the expressiveness of functional languages 
be used advantageously in such systems, and can performance issues be overcome 
at least for the most common applications? 

For the past several years we have been trying to answer these questions in 
the affirmative. We have developed a general paradigm called functional reactive 
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and Space Administration (NCC 2-1229). The second author was also supported by 
an NSF Graduate Research Fellowship. 
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programming that is well suited to programming hybrid systems, i.e. systems 
with both continuous and discrete components. An excellent example of a hy- 
brid system is a mobile robot. From a physical perspective, mobile robots have 
continuous components such as voltage-controlled motors, batteries, and range 
finders, as well as discrete components such as microprocessors, bumper switches, 
and digital communication. More importantly, from a logical perspective, mobile 
robots have continuous notions such as wheel speed, orientation, and distance 
from a wall, as well as discrete notions such as running into another object, 
receiving a message, or achieving a goal. 

Functional reactive programming was first manifested in Fran, a domain 
specific language (DSL) for graphics and animation developed by Conal Elliott 
at Microsoft Research [5,4]. FRP [13,8,16] is a DSL developed at Yale that is 
the “essence” of Fran in that it exposes the key concepts without bias toward 
application specifics. FAL [6], Frob [11,12], Frisian [14], and Fruit [2] are four 
other DSLs that we have developed, each embracing the paradigm in ways suited 
to a particular application domain. In addition, we have pushed FRP toward 
real-time embedded systems through several variants including Real-Time FRP 
and Event-Driven FRP [18,17,15]. 

The core ideas of functional reactive programming have evolved (often in 
subtle ways) through these many language designs, culminating in what we now 
call Yampa, which is the main topic of this paper. ^ Yampa is a DSL embedded 
in Haskell and is a refinement of FRP. Its most distinguishing feature is that the 
core FRP concepts are represented using arrows [7], a generalization of monads. 
The programming discipline induced by arrows prevents certain kinds of time- 
and space-leaks that are common in generic FRP programs, thus making Yampa 
more suitable for systems having real-time constraints. 

Yampa has been used to program real industrial-strength mobile robots [10, 
8]^, building on earlier experience with FRP and Frob [11,12]. In this paper, 
however, we will use a robot simulator. In this way, the reader will be able to run 
all of our programs, as well as new ones that she might write, without having to 
buy a $10,000 robot. All of the code in this paper, and the simulator itself, are 
available via the Yampa home page at www.haskell.org/yampa. 

The simulated robot, which we refer to as a simbot, is a differential drive 
robot, meaning that it has two wheels, much like a cart, each driven by an 
independent motor. The relative velocity of these two wheels thus governs the 
turning rate of the simbot; if the velocities are identical, the simbot will go 
straight. The physical simulation of the simbot includes translational inertia, 
but (for simplicity) not rotational inertia. 

The motors are what makes the simbot go; but it also has several kinds of 
sensors. First, it has a bumper switch to detect when the simbot gets “stuck.” 

^ Yampa is a river in Colorado whose long placid sections are occasionally interrupted 
by turbulent rapids, and is thus a good metaphor for the continuous and discrete 
components of hybrid systems. But if you prefer acronyms, Yampa was started at 
YAle, ended in Arrows, and had Much Programming in between. 

^ In these two earlier papers we referred to Yampa as AFRP. 
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That is, if the simbot runs into something, it will just stop and signal the pro- 
gram. Second, it has a range finder that can determine the nearest object in any 
given direction. In our examples we will assume that the simbot has independent 
range finders that only look forward, backward, left, and right, and thus we will 
only query the range finder at these four angles. Third, the simbot has what we 
call an “animate object tracker” that gives the location of all other simbots, as 
well as possibly some free-moving balls, that are within a certain distance from 
the simbot. You can think of this tracker as modelling either a visual subsystem 
that can see these objects, or a communication subsystem through which the 
simbots and balls share each others’ coordinates. Each simbot also has a unique 
ID and a few other capabilities that we will introduce as we need them. 

2 Yampa Basics 

The most important concept underlying functional reactive programming is that 
of a signal: a continoiis, time-varying value. One can think of a signal as having 
polymorphic type: 



That is, a value of type Signal a is a function mapping suitable values of time 
(Double is used in our implementation) to a value of type a. Conceptually, then, 
a signal s’s value at some time t is just s(t). 

For example, the velocity of a differential drive robot is a pair of numbers 
representing the speeds of the left and right wheels. If the speeds are in turn 
represented as type Speed, then the robot’s velocity can be represented as type 
Signal (Speed , Speed) . A program controlling the robot must therefore provide 
such a value as output. 

Being able to define and manipulate continuous values in a programming 
language provides great expressive power. For example, the equations governing 
the motion of a differential drive robot [3] are: 



where x{t), y{t), and 0{t) are the robot’s x and y coordinates and orientation, 
respectively; Vr(t) and vi{t) are the right and left wheel speeds, respectively; and 
I is the distance between the two wheels. In FRP these equations can be written 
as: 



Signal a = Time -> a 



x{t) = i fg(Vr(t) + Vl(t)) cos(0(t)) dt 
yW = sin(6»(t)) dt 

= T Io(^r(t) - Vl(t)) dt 



y 

theta = 



x 



(1/2) * integral ((vr + vl) * cos theta) 
(1/2) * integral ((vr + vl) * sin theta) 
(1/1) * integral (vr - vl) 



162 



P. Hudak et al. 



All of the values in this FRP program are implicitly time-varying, and thus 
the explicit time t is not present.^ Nevertheless, the direct correspondence be- 
tween the physical equations (i.e. the specification) and the FRP code (i.e. the 
implementation) is very strong. 



2.1 Arrowized FRP 

Although quite general, the concept of a signal can lead to programs that have 
conspicuous time- and space-leaks,^ for reasons that are beyond the scope of 
this paper. Earlier versions of Fran, FAL, and FRP used various methods to 
make this performance problem less of an issue, but ultimately they all either 
suffered from the problem in one way or another, or introduced other problems 
as a result of fixing it. 

In Yampa, the problem is solved in a more radical way: signals are simply not 
allowed as first-class values! Instead, the programmer has access only to signal 
transformers, or what we prefer to call signal functions. A signal function is just 
a function that maps signals to signals: 

SF a b = Signal a -> Signal b 

However, the actual representation of the type SF in Yampa is hidden (i.e. SF 
is abstract), so one cannot directly build signal functions or apply them to sig- 
nals. Instead of allowing the user to define arbitrary signal functions from scratch 
(which makes it all too easy to introduce time- and space-leaks), we provide a set 
of primitive signal functions and a set of special composition operators (or “com- 
binators”) with which more complex signal functions may be defined. Together, 
these primitive values and combinators provide a disciplined way to define sig- 
nal functions that, fortuitously, avoids time- and space-leaks. We achieve this by 
structuring Yampa based on arrows, a generalization of monads proposed in [7]. 
Specifically, the type SF is made an instance of the Arrow class. 

So broadly speaking, a Yampa program expresses the composition of a possi- 
bly large number of signal functions into a composite signal function that is then 
“run” at the top level by a suitable interpreter. A good analogy for this idea is a 
state or 10 monad, where the state is hidden, and a program consists of a linear 
sequencing of actions that are eventually run by an interpreter or the operating 
system. But in fact arrows are more general than monads, and in particular the 
composition of signal functions does not have to be completely linear, as will 
be illustrated shortly. Indeed, because signal functions are abstract, we should 

^ This implies that the sine, cosine, and arithmetic operators are over-loaded to handle 
signals properly. 

^ A time-leak in a real-time system occurs whenever a time-dependent computation 
falls behind the current time because its value or effect is not needed yet, but then 
requires “catching up” at a later point in time. This catching up process can take 
an arbitrarily long time, and may or may not consume space as well. It can destroy 
any hope for real-time behavior if not managed properly. 
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be concerned that we have a sufhcient set of combinators to compose our signal 
functions without loss of expressive power. 

We will motivate the set of combinators used to compose signal functions by 
using an analogy to so-called “point-free” functional programming (an example 
of which is the Bird/Meertens formalism [1]). For the simplest possible example, 
suppose that fl : : A -> B and f2 : : B -> C. Then instead of writing: 

g : : A -> C 
g X = f2 (fl x) 

we can write in a point-free style using the familiar function composition oper- 
ator: 

g = f2 . fl 

This code is “point-free” in that the values (points) passed to 
a function are never directly manipulated. 

To do this at the level of signal functions, all we need is a 
to “lift” ordinary functions to the level of signal functions: 

arr : : (a -> b) -> SF a b 

and a primitive combinator to compose signal functions: 

(>>>) : : SF a b -> SF b c -> SF a c 

We can then write: 

g’ : : SF A C 
g> = arr g 

= arr fl >>> arr f2 

Note that (>>>) actually represents reverse function composition, and thus its 
arguments are reversed in comparison to ( . ) . 

Unfortunately, most programs are not simply linear compositions of func- 
tions, and it is often the case that more than one input and/or output is needed. 
For example, suppose that fl :: A->B, f2 :: A-> C and we wish to define 
the following in point-free style: 

h : : A -> (B,C) 
h X = (fl X, f2 x) 

Perhaps the simplest way is to define a combinator: 

(&) :: (a->b) -> (a->c) -> a -> (b,c) 

(fl & f2) X = (fl X, f2 x) 

which allows us to define h simply as: 



and returned from 
primitive operator 



h = fl & f2 
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In Yampa there is a combinator (&&&) ;; SF a b -> SF a c -> SF a (b, c) 
that is analogous to &, thus allowing us to write: 

h’ : : SF A (B,C) 
h’ = arr h 

= arr fl &&& arr f2 

As another example, suppose that f 1 : : A -> B and f2 : : C -> D. One 
can easily write a point-free version of: 

i : : (A,C) -> (B,D) 
i (x,y) = (fl X, f2 y) 

by using (&) defined above and Haskell’s standard f st and snd operators: 
i = (fl . fst) & (f2 . snd) 

For signal functions, all we need are analogous versions of fst and snd, which 
we can achieve via lifting: 

i’ : : SF (A,C) (B,D) 
i ’ = arr i 

= arr (fl . fst) &&& arr (f2 . snd) 

= (arr fst >>> arr fl) &&& (arr snd >>> arr f2) 

The “argument wiring” pattern captured by i ’ is in fact a common one, and 
thus Yampa provides the following combinator: 

(***) :: SF b c -> SF b> c’ -> SF (b,bO (c,c>) 
f g = (arr fst »> f) &&& (arr snd »> g) 

so that i ’ can be written simply as: 

i’ = arr fl *** arr f2 

g’, h’, and i’ were derived from g, h, and i, respectively, by appealing to 
one’s intuition about functions and their composition. In the next section we 
will formalize this using type classes. 

2.2 The Arrow Class 

One could go on and on in this manner, adding combinators as they are needed to 
solve particular “argument wiring” problems, but it behooves us at some point to 
ask if there is a minimal universal set of combinators that is sufficient to express 
all possible wirings. Note that so far we have introduced three combinators - 
arr, (>>>), and (&&&) - without definitions, and a fourth - (***) - was defined 
in terms of these three. Indeed these three combinators constitute a minimal 
universal set. 

However, this is not the only minimal set. In fact, in defining the original 
Arrow class, Hughes instead chose the set arr, (>>>), and first: 
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class Arrow a where 

arr : : (b -> c) -> a b c 
(>>>) :: abc->acd->abd 
first :: a b c -> a (b,d) (c,d) 

where first is analogous to the following function defined at the ordinary func- 
tion level: 

firstfun f = \(x,y) -> (f x, y) 

In Yampa, the type SF is an instance of class Arrow, and thus these types are 
consistent with what we presented earlier. To help see how this set is miminal, 
here are definitions of second and (&&&) in terms of the Arrow class methods: 

second : : Arrow a => a b c -> a (d,b) (d,c) 
second f = arr swap >>> first f >>> arr swap 
where swap pr = (snd pr, fst pr) 

(&&&) : : Arrow a => a b c -> a b d -> a b (c,d) 
f &&& g = arr (\b -> (b,b)) >>> (first f >>> second g) 

In addition, here is an instance declaration that shows how Haskell’s normal 
function type can be treated as an arrow: 

instance Arrow (->) where 
arr f = f 

f »> g = g • f 

first f = \(b,d) -> (f b, d) 

With this instance declaration, the derivations of g’, h’, and i’ in the previous 
section can be formally justified. 

Exercise 1. Define (a) first in terms of just arr, (>>>), and (&&&), (b) (***) 
in terms of just first, second, and (>>>), and (c) (&&&) in terms of just arr, 
(»>), and (***). 

2.3 Commonly Used Combinators 

In practice, it is better to think in terms of a commonly-used set of combinators 
rather than a minimal set. Figure 1 shows a set of eight combinators that we use 
often in Yampa programming, along with the graphical “wiring of arguments” 
that five of them imply. 

Yampa also provides many convenient library functions that facilitate pro- 
gramming in the arrow framework. Here are some that we will use later in this 
paper: 

identity : : SF a a 
constant : : b -> SF a b 
time : : SF a Time 
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(a) arr f (b) sfl »> sf2 



(c) first sf 




(d) sfl &&& sf2 



(e) loop sf 



Fig. 1. Commonly Used Arrow Combinators 
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The identity signal function is analogous to the identity function in Haskell, 
and in fact is equivalent to arr id. The constant function is useful for generat- 
ing constant signal functions, is analogous to Haskell’s const function, and in fact 
is equivalent to arr . const. Finally, time is a signal function that yields the 
current time, and is equivalent to constant 1.0 >>> integral, where integral 
is a pre-defined Yampa signal function with type:^ 

integral : : SF Double Double 

Yampa also defines a derivative signal function. 

It is important to note that some signal functions are stateful, in that they 
accumulate information over time, integral is a perfect example of such a func- 
tion: by definition, it sums instantaneuous values of a signal over time. Stateful 
signal functions cannot be defined using arr, which only lifts pure functions to 
the level of arrows. Stateful functions must either be pre-defined or be defined 
in terms of other stateful signal functions. 

Stated another way, stateful signal functions such as integration and differ- 
entiation depend intimately on the underlying time-varying semantics, and so 
do not have analogous unlifted forms. Indeed, it is so easy to lift unary functions 
to the level of signal functions that there is generally no need to provide spe- 
cial signal function versions of them. For example, instead of defining a special 
sinSF, cosSF, etc., we can just use arr sin, arr cos, etc. Furthermore, with 
the binary lifting operator: 

arr2 : : (a->b->c) -> SF (a,b) c 

arr2 = arr . uncurry 

we can also lift binary operators. For example, arr2 (+) has type 
Num a => SF (a, a) a. 



2.4 A Simple Example 

To see all of this in action, consider the FRP code presented earlier for the 
coordinates and orientation of the mobile robot. We will rewrite the code for the 
x-coordinate in Yampa (leaving the y-coordinate and orientation as an exercise) . 

Suppose there are signal functions vrSF , vlSF :: SF Simbotinput Speed 
andthetaSF :: SF Simbotinput Angle. The type Simbotinput represents the 
input state of the simbot, which we will have much more to say about later. With 
these signal functions in hand, the previous FRP code for x: 

X = (1/2) * integral ((vr + vl) * cos theta) 

can be rewritten in Yampa as: 

® This function is actually overloaded for any vector space, but that does not concern 
us here, and thus we have specialized it to Double. 
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xSF ; ; SF Simbotinput Distance 
xSF = let V = (vrSF &&& vlSF) >>> arr2 (+) 
t = thetaSF >>> arr cos 

in (v &&& t) »> arr2 (*) »> integral »> arr (/2) 

Exercise 2. Define signal functions ySF and thetaSF in Yampa that correspond 
to the definitions of y and theta, respectively, in FRP. 

2.5 Arrow Syntax 

Although we have achieved the goal of preventing direct access to signals, one 
might argue that we have lost the clarity of the original FRP code: the code 
for xSF is certainly more difficult to understand than that for x. Most of the 
complexity is due to the need to wire signal functions together using the various 
pairing/unpairing combinators such as (&&&) and (***). Precisely to address 
this problem, Paterson [9] has suggested the use of special syntax to make arrow 
programming more readable, and has written a preprocessor that converts the 
syntactic sugar into conventional Haskell code. Using this special arrow syntax, 
the above Yampa code for xSF can be rewritten as: 

xSF’ :: SF Simbotinput Distance 
xSF’ = proc inp -> do 

vr <- vrSF -< inp 

vl <- vlSF -< inp 

theta <- thetaSF -< inp 

i <- integral -< (vr+vl) * cos theta 

returnA -< (i/2) 

Although not quite as readable as the original FRP definition of x, this code is 
far better than the imsugared version. There are several things to note about 
the structure of this code: 

1. The syntax proc pat -> . . . is analogous to a Haskell lambda expression 
of the form \ pat -> ... , except that it defines a signal function rather 
than a normal Haskell function. 

2. In the syntax pat <- SFexpr -< expr, the expression SFexpr must be a 
signal function, say of type SF T1 T2, in which case expr must have type 
T1 and pat must have type T2. This is analogous to pat = expr\ expr 2 in a 
Haskell let or where clause, in which case if expri has type T1 -> T2, then 
expr 2 must have type T1 and pat must have type T2. 

3. The overall syntax: 



proc pat -> do 

pati <- SFexpr^ -< expr^ 
pat2 <- SFexpr2 -< expr2 

returnA -< expr 



Arrows, Robots, and Functional Reactive Programming 169 



defines a signal function. If pat has type T1 and expr has type T2, then the 
type of the signal function is SF T1 T2. In addition, any variable bound by 
one of the patterns pat^ can only be used in the expression expr or in an 
expression expr^ where j > i. In particular, it cannot be used in any of the 
signal function expressions SFexpr^. 

It is important to note that the arrow syntax allows one to get a handle 
on a signal’s values (or samples), but not on the signals themselves. In other 
words, first recalling that a signal function SF a b can be thought of as a type 
Signal a -> Signal b, which in turn can be thought of as type 
(Time -> a) -> (Time -> b) , the syntax allows getting a handle on values of 
type a and b, but not on values of type Time -> a or Time -> b. 

Figure 2(a) is a signal flow diagram that precisely represents the wiring im- 
plied by the sugared definition of xSF ’ . (It also reflects well the data dependencies 
in the original FRP program for x.) Figure 2(b) shows the same diagram, except 
that it has been overlaid with the combinator applications implied by the unsug- 
ared definition of xSF (for clarity, the lifting via arr of the primitive functions 
- i.e. those drawn with circles - is omitted). These diagrams demonstrate nicely 
the relationship between the sugared and imsugared forms of Yampa programs. 

Exercise 3. Rewrite the definitions of ySF and thetaSF from the previous exer- 
cise using the arrow syntax. Also draw their signal flow diagrams. 



2.6 Discrete Events and Switching 

Most programming languages have some kind of conditional choice capability, 
and Yampa is no exception. Indeed, given signal functions flag : : SF a Bool 
and sfx, sfy ; ; SF a b, then the signal function: 

sf : : SF a b 
sf = proc i -> do 

X <- sfx -< i 

y <- sfy -< i 

b <- flag -< i 

returnA -< if b then x else y 

behaves like sfx whenever flag yields a true value, and like sfy whenever it 
yields false. 

However, this is not completely satisfactory, because there are many situ- 
ations where one would prefer that a signal function switch into, or literally 
become, some other signal function, rather than continually alternate between 
two signal functions based on the value of a boolean. Indeed, there is often a 
succession of new signal functions to switch into as a succession of particular 
events occurs, much like state changes in a finite state automaton. Furthermore, 
we would like for these newly invoked signal functions to start afresh from time 
zero, rather than being signal functions that have been “running” since the 
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(a) Sugared 




(b) Unsugared 



Fig. 2. Signal Flow Diagrams for xSF 
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program began. This relates precisely to the issue of “statefulness” that was 
previously discussed. 

This advanced functionality is achieved in Yampa using events and switching 
comhinators. 

In previous versions of FRP, including Fran, Frob, and FAL, a significant 
distinction was made between continuous values and discrete events. In Yampa 
this distinction is not as great. Events in Yampa are just abstract values that are 
isomorphic to Haskell’s Maybe data type. A signal of type Signal (Event b) 
is called an event stream, and is a signal that, at any point in time, yields 
either nothing or an event carrying a value of type b. A signal function of type 
SF a (Event b) generates an event stream, and is called an event source. 

Note: Although event streams and continuous values are both represented as 
signals in Yampa, there are important semantic differences between them. For 
example, improper use of events may lead to programs that are not convergent, 
or that allow the underlying sampling rate to “show through” in the program’s 
behavior. Semantically speaking, event streams in Yampa should not be “in- 
finitely dense” in time; practically speaking, their frequency should not exceed 
the internal sampling rate unless buffering is provided.® 

As an example of a well-defined event source, the signal function: 

rsStuck :: SF Simbotinput (Event ()) 

generates an event stream whose events correspond to the moments when the 
robot gets “stuck:” that is, an event is generated every time the robot’s motion 
is blocked by an obstacle that it has run into. 

What makes event streams special is that there is a special set of func- 
tions that use event streams to achieve various kinds of switching. The simplest 
switching combinator is called switch, whose type is given by: 

switch : : SF a (b, Event c) -> (c -> SF a b) -> SF a b 

The expression (sfl &&& es) ‘switch' \e -> sf2 behaves as sfl until the 
first event in the event stream es occurs, at which point the event’s value is 
bound to e and the behavior switches over to sf2. 

For example, in order to prevent damage to a robot wheel’s motor, we may 
wish to set its speed to zero when the robot gets stuck: 

xspd : : Speed -> SF Simbotinput Speed 

xspd V = (constant v &&& rsStuck) ‘switch' \() -> constant 0 

It should be clear that stateful Yampa programs can be constructed using switch- 
ing combinators. 

Exercise 4- Rather than set the wheel speed to zero when the robot gets stuck, 
negate it instead. Then define xspd recursively so that the velocity gets negated 
every time the robot gets stuck. 

® Certain input events such as key presses are in fact properly bufferred in our imple- 
mentation such that none will be lost. 
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Switching semantics. There are several kinds of switching combinators in 
Yampa, four of which we will use in this paper. These four switchers arise out 
of two choices in the semantics: 

1. Whether or not the switch happens exactly at the time of the event, or 
infinitesimally just after. In the latter case, a “d” (for “delayed”) is prefixed 
to the name switch. 

2. Whether or not the switch happens just for the first event in an event stream, 
or for every event. In the latter case, an “r” (for “recurring”) is prefixed to 
the name switch. 

This leads to the four switchers, whose names and types are: 

switch, dSwitch : : SF a (b, Event c) -> (c -> SF a b) -> SF a b 
rSwitch, drSwitch :: SF a b -> SF (a, Event (SF a b)) b 

An example of the use of switch was given above. Delayed switching is useful 
for certain kinds of recursive signal functions. In Sec. 2.7 we will see an example 
of the use of drSwitch. 

As mentioned earlier, an important property of switching is that time begins 
afresh within each signal function being switched into. For example, consider the 
expression: 

let sinSF = time >>> arr sin 

in (sinSF &&& rsStuck) 'switch' const sinSF 

sinSF to the left of the switch generates a sinusoidal signal. If the first event 
generated by rsStuck happens at time t, then the sinSF on the right will begin 
at time 0, regardless of what the time t is; i.e. the sinusoidal signal will start 
over at the time of the event. 

Useful event functions. Event is an instance of class Functor, and thus 
fmap can be used to change the value carried by an event. For example, we can 
increment the value of an event e : : Event Double by writing fmap (+1) e. 
Sometimes we don’t care about the old value of an event when creating a new 
one, so Yampa also provides: 

tag : : Event a -> b -> Event b 
e 'tag' b = fmap (const b) e 

It is often desirable to merge events; for example, to form the disjunction of 
two logical events. The only problem is deciding what to do with simultaneous 
events. The most general form of merge: 

mergeBy : : (a -> a -> a) -> Event a -> Event a -> Event a 

allows the user to decide how to handle simultaneous events by providing a 
function to combine the event values. Alternatively, one may choose to give 
preference to the left or right event: 
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IMerge : : Event a -> Event a -> Event a 
rMerge : : Event a -> Event a -> Event a 

If there is no possibility of simultaneous events, merge may be used, which 
generates an error if in fact two events occur together: 

merge : : Event a -> Event a -> Event a 

So far we have only considered pre-existing events. Some of these may come 
from external sources, such as a bumper switch or communications subsystem, 
but it is often convenient to define our own events. Yampa provides a variety of 
ways to generate new events, the most important being: 

edge :: SF Bool (Event ()) 

The expression boolSF >>> edge generates an event every time the signal from 
boolSF goes from False to True (i.e. the “leading edge” of the signal). For 
example, if tempSF :: SF Simbotlnput Temp is a signal function that indicates 
temperature, then: 

alarmSF :: SF Simbotlnput (Event ()) 
alarmSF = tempSF >>> arr (>100) >>> edge 

generates an alarm event if the temperature exceeds 100 degrees. 

Here are a few other useful event generation functions: 

never : : SF a (Event b) 

now : : b -> SF a (Event b) 

after : : Time -> b -> SF a (Event b) 

repeatedly : : Time -> b -> SF a (Event b) 

never is an event source that never generates any event occurrences, now v 
generates exactly one event, whose time of occurrence is zero (i.e. now) and whose 
value is v. The expression after t v generates exactly one event, whose time 
of occurrence is t and whose value is v. Similarly, repeatedly t v generates an 
event every t seconds, each with value v. 

To close this section, we point out that the discrete and continuous worlds 
interact in important ways, with switching, of course, being the most funda- 
mental. But Yampa also provides several other useful functions to capture this 
interaction. Here are two of them: 

hold : : a -> SF (Event a) a 

accum : : a -> SF (Event (a -> a)) (Event a) 

The signal function hold v initially generates a signal with constant value v, 
but every time an event occurs with value v’, the signal takes on (i.e. “holds”) 
that new value v’. The signal function accum vO is assentially an event stream 
transformer. Each input event generates one output event. If /„ is the function 
corresponding to the nth input event, then the value of the nth output event 
is just fn Vn-i, for n > 1, and with vq = vO. 

For example, the following signal function represents the number of alarms 
generated from alarmSF defined earlier: 
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alarmCountSF : : SF Simbotinput Int 

alarmCountSF = alarmSF >>> arr ('tag' (+1)) >>> accum 0 >>> hold 0 

Indeed, the accum followed by hold idiom is so common that it is predefined in 
Yampa: 

accumHold ;; a -> SF (Event (a -> a)) a 
accumHold init = accum init >>> hold init 

Exercise 5. Suppose v ; ; SF Simbotinput Velocity represents the scalar ve- 
locity of a simbot. If we integrate this velocity, we get a measure of how far the 
simbot has traveled. Define an alarm that generates an event when either the 
simbot has traveled more than d meters, or it has gotten stuck. 

2.7 Recursive Signals 

Note in Fig. 1 the presence of the loop combinator. Its purpose is to define 
recursive signal functions; i.e. it is a fixpoint operator. The arrow syntax goes 
one step further by allowing recursive definitions to be programmed directly, 
which the preprocessor expands into applications of the loop combinator. In this 
case the user must include the keyword rec prior to the collection of recursive 
bindings. 

For example, a common need when switching is to take a “snapshot” of the 
signal being switched out of, for use in computing the value of the signal being 
switched into. Suppose that there is an event source 

incrVelEvs :: SF Simbotinput (Event ()) whose events correspond to com- 
mands to increment the velocity. We can define a signal function that responds 
to these commands as follows: 

vel : : Velocity -> SF Simbotinput Velocity 
vel vO = proc inp -> do 

rec e <- incrVelEvs -< inp 

v <- drSwitch (constant vO) -< (inp, e 'tag' constant (v+1)) 
returnA -< v 

Note that v is recursively defined. This requires the use of the rec keyword, and 
also the use of a delayed switch to ensure that the recursion is well founded. Also 
note that the recurring version of switch is used, because we want the velocity 
update to happen on every event. Finally, note the use of tag to update the 
value of an event. 

The need for a delayed switch is perhaps best motivated by analogy to re- 
cursively defined lists, or streams. The definition: 

ones = 1 : ones 

expresses the usual infinite stream of ones, and is obviously well founded, whereas 
the list: 



ones 



ones 
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is obviously not well founded. The value of 1 placed at the front of the list 
can be thought of as a delay in the access of ones. That is the idea behind a 
delayed switch, although semantically the delay is intended to be infinitesimally 
small, and in the implementation we avoid introducing a delay that could affect 
performance. 

Exercise 6. Redefine vel using dSwitch instead of drSwitch, and without using 
the rec keyword. (Hint: define vel recursively instead of defining v recursively.) 

3 Programming the Robot Simulator 

3.1 Robot Input and Output 

Generally speaking, one might have dozens of different robots, some real, some 
simulated, and each with different kinds of functionality (two wheels, three 
wheels, four wheels, cameras, sonars, bumper switches, actuators, speakers, flash- 
ing lights, missle launchers, and so on). These differences are captured in the 
input and output types of the robot. For example, there is only one kind of 
simulated robot, or simbot, whose input type is Simbot Input and whose output 
type is SimbotOutput. 

[Note: The code described in this section works with Yampa version 0.9 (and 
0.9.x patches), but some changes are anticipated for use with future Yampa 
versions 1.0 and higher. In particular, the module names will change. In Yampa 
0.9 they are still known under their old names (AFrob, AFrobRobotSim, etc.) for 
backwards compatibility reasons.] 

We refer to the collection of Yampa libraries that are robot-specific as AFrob. 
The AFrob library was written to be as generic as possible, and thus it does not 
depend directly on the robot input and output types. Rather, type classes are 
used to capture different kinds of functionality. Each robot type is an instance 
of some subset of these classes, depending on the functionality it has to offer. 

For example, Simbot Input is a member of the type classes shown in the 
upper half of Fig. 3, and SimbotOutput is a member of the lower ones. The types 
Velocity, Distance, Angle, RotVel, RotAcc, Length, Acceleration, Speed, 
Heading, and Bearing are all synonyms for type Double. Type Position2 is a 
synonym for Point2 Position, where: 

data RealFloat a => Point2 a = Point2 !a !a 
deriving Eq 

We will give examples of the use of many of these operations and type classes 
in the examples that follow. Before doing so, however, there is one other detail 
to describe about the output classes. Note in Fig. 3 that the methods in the last 
two classes return a type MR a, where a is constrained to be a MergeableRecord. 
This allows one to incrementally specify certain “fields” of the record, and to 
merge them later. There are two key operations on mergeable records: 

mrMerge : : MergeableRecord a => MR a -> MR a -> MR a 

mrFinalize : : MergeableRecord a => MR a -> a 
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— Input Classes And Related Functions 



class HasRobot Status i where 

rsBattStat : : i -> BatteryStatus — Curent battery status 

rsIsStuck : : i -> Bool — Currently stuck or not 

data BatteryStatus = BSHigh I BSLow I BSCritical 
deriving (Eq, Show) 



— derived event sources : 



rsBattStatChanged : 


: HasRobotStatus 


i 


=> 


SF 


i 


(Event 


BatteryStatus) 


rsBattStatLow : 


: HasRobotStatus 


i 


=> 


SF 


i 


(Event 


0) 


rsBattStatCritical : 


: HasRobotStatus 


i 


=> 


SF 


i 


(Event 


0) 


rsStuck : 


: HasRobotStatus 


i 


=> 


SF 


i 


(Event 


0) 



class HasOdometry i where 

odometryPosition : : i -> Position2 — Current position 
odometryHeading : : i -> Heading — Current heading 

class HasRangeFinder i where 

rfRange : : i -> Angle -> Distance 
rfMaxRange : : i -> Distance 

— derived range finders: 

rfFront : : HasRangeFinder i => i -> Distance 

rfBack : : HasRangeFinder i => i -> Distance 

rfLeft : : HasRangeFinder i => i -> Distance 

rfRight : : HasRangeFinder i => i -> Distance 

class HasAnimateObjectTracker i where 

aotOtherRobots : : i -> [(RobotType, Angle, Distance)] 
aotBalls : : i -> [(Angle, Distance)] 

class HasTextualConsoleInput i where 
tciKey : : i -> Maybe Char 

tciNewKeyDown : : HasTextualConsoleInput i => 

Maybe Char -> SF i (Event Char) 
tciKeyDown : : HasTextualConsoleInput i => SF i (Event Char) 

— Output Classes And Related Functions 



class MergeableRecord o => HasDiffDrive o where 

ddBrake : : MR o — Brake both wheels 

ddVelDiff : : Velocity -> Velocity -> MR o — set wheel velocities 

ddVelTR :: Velocity -> RotVel -> MR o — set vel . and rot. 

class MergeableRecord o => HasTextConsoleOutput o where 

tcoPrintMessage : : Event String -> MR o 



Fig. 3. Robot Input and Output Classes 
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For example, the expression: 

sbo : : Simbot Output 
sbo = mrFinalize 

(ddVelDiff veil vel2 'mrMerge' tcoPrintMessage stringEvent) 

merges the velocity output with a console message. 

For simbots, it turns out that velocity control and message output are the 
only two things that can be merged, so the use of the MergeableRecord class 
may seem like an overkill. However, for other robots there may be many such 
mergeable outputs, and the functionality thus offered is quite convenient. 

When two common outputs are merged, the result depends on how the 
mrMerge and mrFinalize methods are defined to behave. The designer of a par- 
ticular instance of these methods might signal an error, accept one output or the 
other (for example, merging two calls to ddVelDiff yields the value of the first 
one), or combine the two (for example, merging two calls to tcoPrintMessage 
results in both messages being printed in order). 

3.2 Robot Controllers 

To control a robot we must define a robot controller, which, for the case of 
simbots, must have type: 

type SimbotController = 

SimbotProperties -> SF Simbotlnput SimbotOutput 

SimbotProperties is a data type that specifies static properties of a simbot. 
These properties are accessed abstractly in that SimbotProperties is an in- 
stance of the HasRobotProperties type class: 



class HasRobotProperties i where 




rpType : 


i -> RobotType 


— Type of robot 


rpld : 


i -> Robotld 


— Identity of robot 


rpDicuneter : 


i -> Length 


— Distance between wheels 


rpAccMax : 


i -> Acceleration 


— Max translational acc 


rpWSMax ; 


i -> Speed 


— Max wheel speed 


type RobotType 


= String 




type Robotld = 


Int 





The simulator knows about two versions of the simbot, for which each of these 
properties is slightly different. The RobotType field is just a string, which for 
the simbots will be either "SimbotA" or "SimbotB". The remaining fields are 
self-explanatory. 

To actually run the simulator, we use the function: 

runSim : : Maybe WorldTemplate -> 

SimbotController -> SimbotController -> 10 0 
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where a World! emplate is a data type that describes the initial state of the sim- 
ulator world. It is a list of simbots, walls, balls, and blocks, along with locations 
of the centers of each: 



type WorldTemplate = [ObjectTemplate] 


data ObjectTemplate = 




OTBlock { otPos 




Position2 } — Square obstacle 


1 OTVWall { otPos 




Position2 } — Vertical wall 


1 OTHWall { otPos 




Position2 } — Horizontal wall 


1 OTBall { otPos 




Position2 } — Ball 


1 OTSimbotA { otRld 




Robotld, — Simbot A robot 


otPos 




Position2 , 


otHdng 




Heading } 


1 OTSimbotB { otRld 




Robotld, — Simbot B robot 


otPos 




Position2 , 


OtHdng 




Heading } 



The constants worldXMin, worldYMin, worldXMax, and worldYMax are the bounds 
of the simulated world, and are assumed to be in meters. Currently these values 
are -5, -5, 5, and 5, respectively (i.e. the world is 10 meters by 10 meters, with 
the center coordinate being (0, 0)). The walls are currently fixed in size at 1.0 m 
by 0.1m, and the blocks are 0.5 m by 0.5 m. The diameter of a simbot is 0.5 m. 

Your overall program should be structured as follows: 



module MyRobotShow where 

import AFrob 
import AFrobRobotSim 

main : : 10 () 

main = runSim (Just world) rcA rcB 

world : : WorldTemplate 
world = . . . 

rcA :: SimbotController — controller for simbot A’s 
rcA = ... 

rcB :: SimbotController — controller for simbot B’s 
rcB = ... 
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The module AFrob also imports the Yampa library. The module AFrobRobotSim 
is the robot simulator. 

Note that many robots may be created of the same kind (i.e. simbot A or 
simbot B) in the world template, but the same controller will be invoked for all 
of them. If you want to distinguish amongst them, simply give them different 
RobotID’s. For example, if you have three simbot A robots, then your code for 
controller rcA can be structured like this: 

rcA : : SimbotController 
rcA rProps = 

case rpid rProps of 

1 -> rcAl rProps 

2 -> rcA2 rProps 

3 -> rcA3 rProps 

rcAl , rcA2, rcA3 :: SimbotController 
rcAl = ... 
rcA2 = ... 
rcA3 = ... 



3.3 Basic Robot Movement 

In this section we will write a series of robot controllers, each of type 
SimbotController. Designing controllers for real robots is both an art and a 
science. The science part includes the use of control theory and related math- 
ematical techniques that focus on differential equations to design optimal con- 
trollers for specific tasks. We will not spend any time on control theory here, and 
instead will appeal to the reader’s intuition in the design of functional, if not 
optimal, controllers for mostly simple tasks. For more details on the kinematics 
of mobile robots, see [3] . 



Stop, go, and turn. For starters, let’s define the world’s dumbest controller - 
one for a stationary simbot: 

rcStop : : SimbotController 

rcStop _ = constant (mrFinalize ddBrake) 

Or we could make the simbot move blindly forward at a constant velocity: 

rcBlindl _ = constant (mrFinalize $ ddVelDiff 10 10) 

We can do one better than this, however, by first determining the maximal 
allowable wheel speeds and then running the simbot at, say, one-half that speed: 

rcBlind2 rps = 

let max = rpWSMax rps 

in constant (mrFinalize $ ddVelDiff (max/2) (max/2)) 
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We can also control the simbot through ddVelTR, which allows specifying 
the simbot’s forward and rotational velocities, rather than the individual wheel 
speeds. For a differential drive robot, the maximal rotational velocity depends on 
the vehicle’s forward velocity; it can rotate most quickly when it is standing still, 
and cannot rotate at all if it is going at its maximal forward velocity (because to 
turn while going at its maximal velocity, one of the wheels would have to slow 
down, in which case it would no longer be going at its maximal velocity). If the 
maximal wheel velocity is Vmax, and the forward velocity is f/, then it is easy 
to show that the maximal rotational velocity in radians per second is given by: 

^max — J 

For example, this simbot turns as fast as possible while going at a given speed: 

rcTurn : : Velocity -> SimbotController 
rcTurn vel rps = 

let vMax = rpWSMax rps 

rMax = 2 * (vMax - vel) / rpDicuneter rps 
in constant (mrFinalize $ ddVelTR vel rMax) 

Exercise 7. Link rcBlind2, rcTurn, and rcStop together in the following way: 
Perform rcBlind2 for 2 seconds, then rcTurn for three seconds, and then do 
rcStop. (Hint: use after to generate an event after a given time interval.) 

The simbot talks (sort of). For something more interesting, let’s define 
a simbot that, whenever it gets stuck, reverses its direction and displays the 
message "Ouch! I " on the console: 

rcReverse : : Velocity -> SimbotController 

rcReverse v rps = beh 'dSwitch' const (rcReverse (-v) rps) 
where beh = proc sbi -> do 

stuckE <- rsStuck -< sbi 

let mr = ddVelDiff v v ‘mrMerge' 

tcoPrintMessage (tag stuckE "Ouch!!") 
returnA -< (mrFinalize mr, stuckE) 

Note the use of a let binding within a proc: this is analogous to a let binding 
within Haskell’s monadic do syntax. Note also that rcReverse is recursive - 
this is how the velocity is reversed everytime the simbot gets stuck - and there- 
fore requires the use of dSwitch to ensure that the recursion is well founded. 
(It does not require the rec keyword, however, because the recursion occurs 
outside of the proc expression.) The other reason for the dSwitch is rather sub- 
tle: tcoPrintMessage uses stuckE to control when the message is printed, but 
StuckE also controls the switch; thus if the switch happened instantaneously, 
the message would be missed! 
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If preferred, it is not hard to write rcReverse without the arrow syntax: 
rcReverse’ v rps = 

(rsStuck >>> arr fun) ‘dSwitch' const (rcReverse’ (-v) rps) 
where fun stuckE = 

let mr = ddVelDiff v v ‘mrMerge' 

tcoPrintMessage (tag stuckE "Ouch!!") 
in (mrFinalize mr, stuckE) 

Exercise 8. Write a version of rcReverse that, instead of knowing in advance 
what its velocity is, takes a “snapshot” of the velocity, as described in Sec. 2.7, 
at the moment the stuck event happens, and then negates this value to continue. 



Finding our way using odometry. Note from Fig. 3 that our simbots have 
odometry; that is, the ability of a robot to track its own location. This capability 
on a real robot can be approximated by so-called “dead reckoning,” in which the 
robot monitors its actual wheel velocities and keeps track of its position incre- 
mentally. Unfortunately, this is not particularly accurate, because of the errors 
that arise from wheel slippage, uneven terrain, and so on. A better technique is 
to use GPS (global positioning system) , which uses satellite signals to determine 
a vehicle’s position to within a few feet of accuracy. In our simulator we will 
assume that the simbot’s odometry is perfect. 

We can use odometry readings as feedback into a controller to stabilize and 
increase the accuracy of some desired action. For example, suppose we wish to 
move the simbot at a fixed speed in a certain direction. We can set the speed 
easily enough as shown in the examples above, but we cannot directly specify 
the direction. However, we can read the direction using the odometry function 
odometryHeading : : Simbotinput -> Heading and use this to control the ro- 
tational velocity. 

(A note about robot headings. In AFrob there are three data types that relate 
to headings: 

1. Heading is assumed to be in radians, and is aligned with the usual Cartesian 
coordinate system, with 0 radians corresponding to the positive x-axis, tt/2 
the positive y-axis, and so on. Its normalized range is [— 7r,7r). 

2. Bearing is assumed to be in degrees, and is aligned with a conventional 
compass, with 0 degrees corresponding to north, 90 degrees to east, and so 
on. Its normalized range is [0,360). 

3. Angle is assumed to be in radians, but is a relative measure rather than 
being aligned with something absolute. 

AFrob also provide conversion functions between bearings and headings: 

bearingToHeading : : Bearing -> Heading 
headingToBearing : : Heading -> Bearing 

However, in this paper we only use headings and relative angles.) 
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Getting back to our problem, if hd and ha are the desired and actual headings 
in radians, respectively, then the heading error is just hg = hd — ha- If hg is 
positive, then we want to turn the robot in a counter-clockwise direction (i.e. 
using a positive rotational velocity), and if h^ is negative, then we want to turn 
the robot in a clockwise direction (i.e. using a negative rotational velocity). In 
other words, the rotational velocity should be directly proportioonal to he (this 
strategy is thus called a proportionate controller). One small complication to 
this scheme is that we need to normalize hd — ha to keep the angle in the range 
[— 7r,7r). This is easily achieved using Yampa’s normalizeAngle function. Here 
is the complete controller: 

rcHeading : : Velocity -> Heading -> SimbotController 
rcHeading vel hd rps = 

let vMax = rpWSMax rps 
vel’ = lim vMax vel 
k =2 

in proc sbi -> do 

let he = normalizeAngle (hd - odometryHeading sbi) 
let vel’’ = (1 - abs he / pi) * vel’ 
returnA -< mrFinalize (ddVelTR vel’’ (k*he)) 

lim m y = max (-m) (min m y) 

The parameter k is called the gain of the controller, and can be adjusted to give 
a faster response, at the risk of being too fast and thus being unstable, lim m y 
limits the maximum absolute value of y to m. 

Before the next example we will first rewrite the above program in the fol- 
lowing way: 

rcHeading’ : : Velocity -> Heading -> SimbotController 
rcHeading’ vel hd rps = 
proc sbi -> do 

rcHeadingAux rps -< (sbi, vel, hd) 

rcHeadingAux : : SimbotProperties -> 

SF (Simbotlnput, Velocity, Heading) SimbotOutput 
rcHeadingAux rps = 

let vMax = rpWSMax rps 
k =2 

in proc (sbi, vel, hd) -> do 

let vel’ = lim vMax vel 

let he = normalizeAngle (hd - odometryHeading sbi) 
let vel’’ = (1 - abs he / pi) * vel’ 
returnA -< mrFinalize (ddVelTR vel’’ (k*he)) 

In the original definition, vel and hd were constant during the lifetime of the 
signal function, whereas in the second version they are treated as signals in 
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rcHeadingAux, thus allowing for them to be time varying. Although not needed 
in this example, we will need this capability below. 

As another example of using odometry, consider the task of moving the sim- 
bot to a specific location. We can do this by computing a trajectory from our 
current location to the desired location. By doing this continually, we ensure 
that drift caused by imperfections in the robot, the floor surface, etc. do not 
cause appreciable error. 

The only complication is that we must take into account our simbot’s trans- 
lational inertia: if we don’t, we may overshoot the target. What we’d like to 
do is slow down as we approach the target (as for rcHeading, this amounts to 
designing a proportionate controller). Here is the code: 

rcMoveTo : : Velocity -> Position2 -> SimbotController 
rcMoveTo vd pd rps = proc sbi -> do 

let (d,h) = vector2RhoTheta (pd .-. odometryPosition sbi) 
vel = if d>2 then vd else vd*(d/2) 
rcHeadingAux rps -< (sbi, vel, h) 

Note the use of vector arithmetic to compute the difference between the de- 
sired position pd and actual position odometryPosition sbi, and the use of 
vector2RhoTheta to convert the error vector into distance d and heading h. 
vel is the speed at which we will approach the target. Finally, note the use 
of rcHeadingAux defined above to move the simbot at the desired velocity and 
heading. 

Exercise 9. rcMoveTo will behave a little bit funny once the simbot reaches its 
destination, because a differential drive robot is not able to maneuver well at 
slow velocities (compare the difficulty of parallel parking a car to the ease of 
switching lanes at high speed). Modify rcMove so that once it gets reasonably 
close to its target, it stops (using rcStop). 

Exercise 10. Define a controller to cause a robot to follow a sinusoidal path. 
(Hint: feed a sinusoidal signal into rcHeadingAux.) 

Exercise 11. Define a controller that takes a list of points and causes the robot 
to move to each point successively in turn. 

Exercise 12. (a) Define a controller that chases a ball. (Hint: use the aotBalls 
method in class HasAnimateObjectTracker to find the location of the ball.) (b) 
Once the ball is hit, the simulator will stop the robot and create an rsStuck 
event. Therefore, modify your controller so that it restarts the robot whenever 
it gets stuck, or perhaps backs up first and then restarts. 



Home on the range. Recall that our simbots have range finders that are able 
to determine the distance of the nearest object in a given direction. We will 
assume that there are four of these, one looking forward, one backward, one to 
the left, and one to the right: 
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rfFront 

rfBack 

rfLeft 

rfRight 



HasRangeFinder i => i 
HasRangeFinder i => i 
HasRangeFinder i => i 
HasRangeFinder i => i 



-> Distance 
-> Distance 
-> Distance 
-> Distance 



These are intended to simulate four sonar sensors, except that they are far more 
accurate than a conventional sonar, which has a rather broad signal. They are 
more similar to the capability of a laser-based range finder. 

With a range finder we can do some degree of autonomous navigation in 
“unknown terrain.” That is, navigation in an area where we do not have a 
precise map. In such situations a certain degree of the navigation must be done 
based on local features that the robot “sees,” such as walls, doors, and other 
objects. 

For example, let’s define a controller that causes our simbot to follow a wall 
that is on its left. The idea is to move forward at a constant velocity v, and as 
the desired distance d from the wall varies from the left range finder reading r, 
adjustments are made to the rotational velocity u> to keep the simbot in line. 
This task is not quite as simple as the previous ones, and for reasons that are 
beyond the scope of this paper, it is desirable to use what is known as a PD 
(for “proportionate/derivative”) controller, which means that the error signal is 
fed back proportionately and also as its derivative. More precisely, one can show 
that, for small deviations from the norm: 



uj = Kp{r - d) + Kd{-^) 

Kp and Kd are the proportionate gain and derivative gain, respectively. Generally 
speaking, the higher the gain, the better the reponse will be, but care must be 
taken to avoid responding too quickly, which may cause over-shooting the mark, 
or worse, unstable behavior that is oscillatory or that diverges. It can be shown 
that the optimal relationship between Kp and Kd is given by: 

Kp = vKj/A 



In the code below, we will set Kd to 5. For pragmatic reasons we will also put a 
limit on the absolute value of lu using the limiting function lim. 

Assuming all of this mathematics is correct, then writing the controller is 
fairly straightforward: 



rcFollowLeftWall : : Velocity -> Distance -> SimbotController 
rcFollowLeftWall v d _ = proc sbi -> do 
let r = rfLeft sbi 
dr <- derivative -< r 
let omega = kp*(r-d) + kd*dr 
kd =5 

kp = v*(kd'2)/4 

returnA -< mrFinalize (ddVelTR v (lim 0.2 omega)) 
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Exercise 13. Enhance the wall-follower controller so that it can make left and 
right turns in a maze constructed only of horizontal and vertical walls. Specifi- 
cally: 

1. If the simbot sees a wall directly in front of itself, it should slow down as it 
approaches the wall, stopping at distance d from the wall. Then it should 
turn right and continue following the wall which should now be on its left. 
(This is an inside-corner right turn.) 

2. If the simbot loses track of the wall on its left, it continues straight ahead for 
a distance d, turns left, goes straight for distance d again, and then follows 
the wall which should again be on its left. (This is an outside-corner left 
turn.) 

Test your controller in an appropriately designed world template. 

Exercise 14- As mentioned in the derivation above, the rcFollowLef tWall con- 
troller is only useful once the robot is close to being on track: i.e. at the proper 
distance from the wall and at the proper heading. If the robot is too far from 
the wall, it will tend to turn too much in trying to get closer, which makes the 
left range finder see an even greater distance, and the system becomes unstable. 
Designing a more robust wall follower is tricky business, and is best treated as 
multi-mode system, where the robot first seeks a wall, aligns itself parallel to the 
wall, and then tries to follow it. Design such a controller. 



Mass hysteria. As mentioned earlier, the simulator can handle a number of 
simbots simultaneously. Groups of robots can exhibit all kinds of interesting and 
productive group behavior (or possibly mass hysteria), limited only by the clev- 
erness of you, the designer. We will describe one simple kind of group behavior 
here, leaving others (such as the soccer match described in Ex. 16) to you. 

The behavior that we will define is that of convergence. Assume that all sim- 
bots are initially moving in arbitrary directions and speeds. Each simbot will 
look at the positions of all of the others, and move toward the centroid (i.e. av- 
erage) of those positions. If each robot does this continuously and independently, 
they will all end up converging upon the same point. 

To achieve this, recall first the HasAnimateObjectTracker class: 

class HasAnimateObjectTracker i where 

aotOtherRobots : : i -> [(RobotType, Robotid, Angle, Distance)] 
aotBalls : : i -> [(Angle, Distance)] 

The first of these operations permits us to determine the angle and distance of 
each of the other simbots. By converting these measurements to vectors, we can 
add them and take their average, then use rcHeading to steer the robot toward 
the resulting centroid. 

Other than dealing with numeric conversions, the final code is fairly straight- 
forward: 
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rcAlign : : Velocity -> SimbotController 
rcAlign v rps = proc sbi -> do 

let neighbors = aotOtherRobots sbi 

vs = map (\(_,_,a,d) -> vector2Polar d a) neighbors 
avg = if vs==[] then zeroVector 

else foldll (~+~) vs ~ / intToFloat (length vs) 
heading = vector2Theta avg + odometryHeading sbi 
rcHeadingAux rps -< (sbi, v, heading) 
intToFloat = fromlnteger . tolnteger 

When observing the world through robot sensors, one should not make too 
many assumptions about what one is going to see, because noise, varying light 
conditions, occlusion, etc. can destroy those expectations. For example, in the 
case of the simbots, the simulator does not guarantee that all other robots will 
be visible through the animate object tracker. Indeed, at the very first time-step, 
none are visible. For reasons of causality, sensor data is delayed one time-step; 
but at the very first time step, there is no previous data to report, and thus the 
animate object tracker returns an empty list of other robots. This is why in the 
code above the list vs is tested for being empty. 

Exercise 15. Write a program for two simbots that are traveling in a straight 
path, except that their paths continually interleave each other, as in a braid of 
rope. (Hint: treat the velocities as vectors, and determine the proper equations 
for two simbots to circle one another while maintaining a specified distance. 
Then add these velocities to the simbots’ forward velocities to yield the desired 
behavior.) 

Exercise 16. Write a program to play “robocup soccer,” as follows. Using wall 
segments, create two goals at either end of the field. Decide on a number of 
players on each team, and write controllers for each of them. You may wish to 
write a couple of generic controllers, such as one for a goalkeeper, one for attack, 
and one for defense. Create an initial world where the ball is at the center mark, 
and each of the players is positioned strategically while being on-side (with the 
defensive players also outside of the center circle) . Each team may use the same 
controller, or different ones. Indeed, you can pit your controller- writing skills 
against those of your friends (but we do not recommend betting money on the 
game’s outcome). 
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Abstract. XQuery is a typed, functional language for querying XML, 
currently being designed by the XML Query Working Group of the 
World-Wide Web Consortium. Here are examples of XQuery queries on 
a suitable XML document describing books. To list titles of all books 
published before 2000 you might write: 



document ("books. xml ")/BQ0KS/B00K [©YEAR < 2000] /TITLE 



To list the year and title of all books published before 2000 you might 
write: 



for $book in document ("books .xml") /BOOKS/BOOK 
where $book/@YEAR < 2000 

return <B00K>{ $book/@YEAR, $book/TITLE }</BDQK> 



And to list for each author the titles of all books by that author you 
might write: 



let $books := document ("books . xml") /BOOKS 
for $author in distinct ($books/B00K/ AUTHOR) return 
<AUTH0R NAHE="{ Sauthor >">■[ 

$books/B0QK [AUTHOR = $author] /TITLE 
}</AUTH0R> 



1 Introduction 

XQuery is a typed, functional language for querying XML. These notes provide 
an introduction to XQuery and related XML standards. 

XQuery is currently being designed by the XML Query Working Group of 
the World-Wide Web Consortium (W3C). The design is currently in flux. The 
design is expressed in a number of documents, including a prose specification [1], 
a formal semantics [2] , and a library of functions and operators [3] . Introductions 
to the formal semantics have been written by Fernandez, Simeon, and Wadler 
[16,15]. 
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XQuery is closely related to other standards for XML. These include XML 
itself [4,5], XML Namespaces [6], XML Schema [7], XML Stylesheet Transfor- 
mations (XSLT) [8,9], and XPath [10,11]. XPath includes a common core of 
material included in XQuery, XSLT, and another standard, XPointer [12], and 
the continued development of XPath is under the joint management of the XML 
Query and XML Stylesheet working groups. 

The XQuery standard includes multiple conformance levels: an XQuery im- 
plementation may choose whether or not to support XML Schema, and whether 
or not to enforce static typing. The discussion here is for a version of XQuery 
that supports XML Schema and static typing. 

The ideas presented here have been developed jointly by the XML Query 
Working Group. Special thanks are due to my colleagues Mary Fernandez and 
Jerome Simeon. 

Since the design of XQuery is in flux, consult the current version of the stan- 
dard for the latest version. All opinions expressed are my own. Other members 
of the XML Query Working Group may hold different opinions. (Some certainly 
do!) 



2 XQuery by example 

To get started, here are three examples of XQuery queries. Assume you have an 
XML document describing books — the format of this document is discussed 
further below. To list titles of all books published before 2000 you might write: 

document ("books. xml ")/B00KS/BQ0K[@YEAR < 2000] /TITLE 

To list the year and title of all books published before 2000 you might write: 

for $book in document ("books. xml") /BOOKS/BOOK 
where $book/@YEAR < 2000 

return <B00K>{ $book/@YEAR, $book/TITLE }</B00K> 

And to list for each author the titles of all books by that author you might write: 

let $books := document ("books. xml") /BOOKS 
for $author in distinct ($books/B00K/AUTH0R) return 
<AUTH0R NAME="{ $author }">{ 

$books/B00K [AUTHOR = $author] /TITLE 
></AUTH0R> 



3 XQuery data model 



Here is a sample XML document describing books, suitable as input to the above 
queries. 
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<B00KS> 

<B00K YEAR="1999 2003"> 

<AUTH0R>AbitebouK/AUTH0R> 

<AUTH0R>Buneman</AUTH0R> 

<AUTHQR>Suciu</AUTHOR> 

<TITLE>Data on the Web</TITLE> 

<REVIEW>A <EM>fine</EM> book . </REVIEW> 

</BDDK> 

<B0DK YEAR="2002"> 

<AUTH0R>Buneman</AUTH0R> 

<TITLE>XML in Scotland</TITLE> 

<REVIEWXEM>The <EM>best</EM> ever ! </EMX/REVIEW> 

</B00K> 

</B00KS> 

XML data tends to come in two styles, database-like and document-like. The 
above has aspects of both. This dual nature of XML is one of its more interesting 
aspects. (There is an old Saturday Night Live routine: “It’s a floor wax! It’s a 
dessert topping! It’s both!” XML is similar. “It’s a database! It’s a document! 
It’s both!”) 

XML data often resembles a database. Listing a year, an author, a title, and 
a review for each book is reminiscent of the columns in a relational database. 
However, the use of multiple author elements for a single book differs from the 
traditional relational approach, which would either have a single author entry 
with an array of authors, or a separate table relating books to authors. 

XML data often resembles a document. Using markup to indicate emphasis 
in a review is typical of documents such as HTML. Note the recursive use of 
emphasis in the second review, where an enthusiastic reviewer has marked the 
entire review as emphasized, then further emphasized one word of it. 

XML is a notation for writing trees. Below is the representation we use to 
describe the tree corresponding to the XML document above, after validation 
against its XML Schema, which is given below. 

document { 

element BOOKS of type BOOKS-TYPE { 
element BOOK of type BOOK-TYPE { 

attribute YEAR of type INTEGER-LIST { 1999, 2003 }, 
element AUTHOR of type xs: string { "Abiteboul" }, 
element AUTHOR of type xs: string { "Bunemcui" }, 
element AUTHOR of type xs: string { "Suciu" 
element TITLE of type xs: string { "Data on the Web" }, 
element REVIEW of type INLINE { 
text { "A " }, 

element EM of type INLINE { text { "fine" }■ }, 
text { " book." } 

} 

} 
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element BOOK { 

attribute YEAR of type INTEGER-LIST { 2002 }, 
element AUTHOR of type xs: string { "Bunemctn" }, 
element TITLE of type xs: string { "XML in Scotland" }, 
element REVIEW of type INLINE { 
element EM of type INLINE { 
text { "The " }, 

element EM of type INLINE { text { "best" } }, 
text { " ever!" } 

} 

} 

} 

> 

> 

Here the leaves of the tree are either strings (enclosed in quotes) or integers (not 
in quotes), and the nodes of the tree are labeled as document, element, attribute, 
or text nodes. Each element and attribute node is labeled with a type; the source 
of these types is the XML Schema. 

Since the purpose of XML is as a notation for data interchange, one would 
expect the mapping from XML into the corresponding tree to be trivial. Alas, it 
is not. In the above, we ignored whitespace between elements. How can one tell 
where whitespace is and is not significant? We mapped the first YEAR attribute 
to a list of two integers (1999, 2003). How can one know that this is the cor- 
rect interpretation, rather than a string ("1999 2003"), or a list of two strings 
("1999" , "2003")? This information, too, comes from the XML Schema, which 
we describe next. 

4 XML Schema 

The expected format of an XML document can be described with an XML 
Schema. From the schema one can determine the expected structure of the 
markup, and what datatype (if any) is associated with character data. 

Typically, XML input consists of both a document and a schema that should 
be used to validate that document. Validation does three things. First, it checks 
that the document has the format indicated by the schema. Second, it labels the 
internal representation of the document with the type specified by the schema. 
Among other things, this resolves the questions of determining when whitespace 
is significant, and of distinguishing between strings and integers. Third, it may 
supply default values for omitted attributes. (We don’t deal with the third point 
further here.) 

In many contexts, the intended XML Schema is known, and only the docu- 
ment is supplied. There are also conventions by which an XML document can 
indicate an associated XML Schema. It may also be that there is no Schema, 
this is discussed further below. 

Here is a Schema for the document described in the preceding section. 
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<xs ; schema xmlns :xs="http : //www. w3 . org/2001/XMLSchema"> 

<xs: element name="BOOKS" type="BOOKS-TYPE"/> 

<xs : complexType name="B00KS-TYPE"> 

<xs : sequence> 

<xs: element name="BD0K" type="BOQKS-TYPE" 
minOccurs= " 0 " maxDccurs= "unbounded" /> 

</xs : sequence> 

</xs : complexType> 

<xs : complexType name="BDOK-TYPE"> 

<xs : sequence> 

<xs: element name="AUTHOR" type="xs : string" 
minOccurs=" 1" maxOccurs= "unbounded" /> 

<xs: element name="TlTLE" type="xs : string"/> 

<xs: element name="REVlEW" type=" INLINE" 
min0ccurs="0" maxOccurs=" l"/> 

</xs ; sequence> 

<xs : attribute name="YEAR" type=" INTEGER-LIST" 
use= " optional " /> 

</xs : complexType> 

<xs : complexType name="lNLlNE" mixed="true"> 

<xs: choice min0ccurs="0" maxOccurs="unbounded"> 

<xs: element name="EM" type=" INLINE "/> 

<xs: element name="B0LD" type="lNLlNE"/> 

</xs ; choice> 

</xs : complexType> 

<xs : simpleType ncime="lNTEGER-LlST"> 

<xs : list itemType="xs : integer"/> 

</xs : simpleType> 

</xs : schema> 

Validating the document of the previous section against the above schema yields 
the data model presented in the previous section. 

The above schema contains one element declaration, three complex type dec- 
larations, and one simple type declaration. 

— The BOOKS element has type BOOKS-TYPE. 

— The BOOKS-TYPE type contains a sequence of zero or more elements with 
name BOOK of type BOOK-TYPE. 

— The BOOK-TYPE type contains a sequence consisting of: 

• A TITLE element of type string. 

• One or more AUTHOR elements of type string. 

• An optional REVIEW element of type INLINE. 

• An optional YEAR attribute of type INTEGER-LIST. 

— The INLINE type contains text nodes and any number of either EM or BOLD 
elements, both of which themselves have type INLINE. 

~ The INTEGER-LIST type contains a sequence of (zero or more) integers. 
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The XQuery formal semantics includes an alternative notation for represent- 
ing schemas, which is more readable, more compact, and more uniform. The 
above schema is written in this notation as follows. 

define element BOOKS of type BOOKS-TYPE 

define type BOOKS-TYPE { 

element BOOK of type BOOK-TYPE * 

} 

define type BOOK-TYPE { 

attribute YEAR of type INTEGER-LIST ? , 
element AUTHOR of type xs ; string + , 
element TITLE of type xs : string , 
element REVIEW of type INLINE ? 

} 

define type INLINE mixed { 

( element EM of type INLINE I 
element BOLD of type INLINE ) * 

} 

define type INTEGER-LIST { 
xs : integer * 

} 

The formal semantics notation utilizes a number of conventions familiar from 
regular expressions: comma ( , ) for sequencing, bar ( I ) for alternation, query (?) 
for optional, plus (+) for one or more, star (*) for zero or more. In the above, 
the line element BOOK of type BOOK-TYPE * is parsed as (element BOOK of 
type BOOK-TYPE) *. 

The formal semantics notation is more uniform than Schema notation. 
Schema uses minOccurs and maxOccurs to indicate whether an element is op- 
tional or repeated, uses optional or required to indicated whether an attribute 
is optional, and uses list to indicate that a value of simple type is repeated. 
The formal semantics notation uses regular expression occurrence indicators (?, 
+, *) for all these purposes. 

There is an older way to specify the structure of XML documents, called a 
Document Type Definition (DTD), which is part of the original XML specifica- 
tion [4, 5] . There are also a number of alternative proposals for specifying the 
structure and datatypes of XML documents, notably Relax NG [13, 14]. Both 
DTDs and the compact syntax for Relax NG use a regular expression notation 
similar to that in the XQuery formal semantics. 

Part of the XQuery formal semantics is a detailed explanation of validation, 
and its relationship to type matching. For an introduction to this theory (in- 
cluding a surprisingly simple theorem that relates validation to type matching) 
see Simeon and Wadler’s POPL paper [16]. 

5 Projection 

Here is a query that lists all authors of all books. 
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document ( "books . xml " ) /BOOKS/BOOK/AUTHOR 

<AUTHOR>AbitebouK/AUTHOR> , 

<AUTHOR>Buneman</ AUTH0R> , 

<AUTHDR>Suciu</AUTHQR> , 

<AUTHDR>Buneman</AUTHOR> 

e 

element AUTHOR of type xsd: string * 

This follows the style we will use to list a query, it’s result, and the static type 
inferred for its resnlt. 

There is a second way to express the same qnery nsing explicit iteration. 

document ( "books . xml " ) /BOOKS/BOOK/AUTHOR 

let $root := document ("books. xml") return 
for $dotl in $root/B00KS return 
for $dot2 in $dotl/B0DK return 
$dot2/AUTHDR 

Note that an associative law applies to both notations. For the XPath slash 
notation we have: 

BOOKS/ (BOOK/AUTHDR) = (BOOKS/BOOK) /AUTHOR 

And for the for notation we have: 

for $dotl in $root/B00KS return 
for $dot2 in $dotl/B00K return 
$dot2/AUTH0R 

for $dot2 in ( 

for $dotl in $root/B00KS return 
$dotl/B00K 
) return 

$dot2/AUTH0R 

6 Selection 

Here is a query that lists titles of all books published before 2000. 
document ("books. xml") /BOOKS/BOOK [OYEAR < 2000] /TITLE 
<TlTLE>Data on the Web</T1TLE> 

e 

element TITLE of type xs : string * 
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Note that the OYEAR attribute is bound to a sequence of integers, and that the 
expression @YEAR < 2000 returns true if some integer in the sequence is smaller 
that 2000. 

Again, there is a second way to express the same query, 
document ("books. xml") /BOOKS/BOOK [0YEAR < 2000] /TITLE 

for $book in document ("books. xml") /BOOKS/BOOK 
where $book/@YEAR < 2000 
return $book/TITLE 

The where clause in the above may be re-expressed as a conditional. 

for $book in document ("books. xml" )/BD0KS/B00K 
where $book/@YEAR < 2000 
returns $book/TITLE 

for $book in /BOOKS/BOOK returns 

if $book/@YEAR < 2000 then $book/TITLE else () 

There is also a second way to express the comparison, which makes the 
existential explicit. 

$book/@YEAR < 2000 

some $year in $book/@YEAR satisfies $year < 2000 
The existential can itself be expressed in terms of iteration and selection, 
some $year in $book/@YEAR satisfies $year < 2000 

not (empty ( 

for $year in $book/@YEAR where $year < 2000 returns $year 

)) 

Combining all the previous laws allows one to expand the original expression 
into a larger expression in a smaller language. 

document ("books. xml") /BOOKS/BOOK [0YEAR < 2000] /TITLE 

let $root := document ("books. xml") return 
for $books in $root/B00KS return 
for $book in $books/BDDK return 
if ( 

not (empty ( 

for $year in $book/@YEAR returns 
if $year < 2000 then $year else () 

)) 

) then 
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$book/TITLE 

else 

0 

7 Static typing issues 

The static type associated with an expression may be too broad or too narrow. 
Here is a query to list all books with the title “Data on the Web” . 

document ("books. xml ")/B00KS/BQ0K [TITLE = "Data on the Web"] 

<B00K YEAR="1999 2003"> 

<AUTHOR>AbitebouK/AUTHDR> 

<AUTHDR>Buneman</AUTHOR> 

<AUTHOR>Suciu</AUTHOR> 

<TITLE>Data on the Web</TITLE> 

<REVIEW>A <EM>fine</EM> book. </REVIEW> 

</B00K> 

G 

element BOOK of type BOOK-TYPE * 

Here the inferred type is too broad. It indicates that there will be zero or more 
books, when in fact one might expect that there should be at most one book 
with a given title; or even exactly one book, if we know we have supplied a valid 
title. Understanding how to exploit information about keys and foreign keys in 
the type system is an important open issue. 

When the statically inferred type is too broad, it may be narrowed using a 
“treat as” expression. 

treat as element BOOK of type BOOK-TYPE ( 

document ("books .xml") /BOOKS/BOOK [TITLE = "Data on the Web"] 

) 

G 

element BOOK of type BOOK-TYPE 

The purpose served by “treat as” expressions in XQuery is similar to that served 
by casting in languages such as Java and C++. 

For convenience, there is also a built-in function that indicates that a result 
sequence will have length one. 

one (/BOOKS/BOOK [TITLE = "Data on the Web"]) 

G 

element BOOK of type BOOK-TYPE 

This allows the type to be inferred, rather than requiring all the type infor- 
mation to be repeated. There are three similar convenience functions: oneO, 
zeroOrOneO, oneOrMoreO. (As of this writing, the convenience functions have 
not yet been approved by the XML Query Working Group.) 
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The type associated with an iteration may also be broader than you might 
expect. Say we define two different elements to represent books supplied by two 
different vendors, and a catalogue containing all books from the first vendor 
followed by all books from the second. 

define element AMAZON-BOOK of type BOOK-TYPE 
define element BN-BOOK of type BOOK-TYPE 
define element CATALOGUE of type CATALOGUE-TYPE 
define type CATALOGUE-TYPE { 

element AMAZON-BOOK * , element BN-BOOK* 

} 

Here is a query to list all books in the catalogue with Buneman as an author. 

let $catalogue := document ("catalogue. xml") /CATALOGUE 
for $book in ($catalogue/AMAZ0N-B00K, $catalogue/BN-BOOK) 
where $book/AUTH0R = "Buneman" 
return $book 

e 

( element AMAZON-BOOK I element BN-BOOK )* 

element AMAZON-BOOK * , element BN-BOOK * 

The typing rule for iteration assumes that the type of the bound variable is an 
alternation of elements. Here, the bound variable $book is given type 

element AMAZON-BOOK I element BN-BOOK 

and hence the type of the iteration is as shown. This loses the information that all 
of the books from the first vendor will proceed books from the second vendor. If 
this information is important, it may be recovered by use of a suitable “treat as” 
expression, which will test at run-time that the value has the expected structure. 

treat as type CATALOGUE-TYPE ( 

let $catalogue := document ("catalogue. xml") /CATALOGUE 
for $book in ($catalogue/AMAZ0N-B00K, $catalogue/BN-BOOK) 
where $book/AUTH0R = "Buneman" 
return $book 

) 

e 

element AMAZON-BOOK * , element BN-BOOK * 

The best trade-off between simplicity in the definition of iteration and accuracy 
of the inferred types is an important open issue. 

8 Construction 



Here is a query to list the year and title of all books published before 2000. 
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for $book in document ("books. xml") /BOOKS/BOOK 
where $book/@YEAR < 2000 

return <B00K>{ $book/@YEAR, $book/TITLE }</B00K> 

<B00K YEAR="1999 2003"> 

<TITLE>Data on the Web</TITLE> 

</B00K> 

e 

element BOOK { 

attribute YEAR { integert } , 
element TITLE { string } 

} * 

XQuery actually provides two notations for element and attribute construc- 
tion. The “physical” notation looks like XML, the “logical” notation emphasizes 
the underlying tree structure. 

<B00K>{ $book/@YEAR , $book/TlTLE }</B00K> 

element BOOK { $book/@YEAR , $book/TlTLE > 

The XML-like notation nests arbitrarily deep, allowing brackets to splice-in val- 
ues or nodes inside attributes or elements. 

<B00K YEAR="{ data($book/@YEAR) }"> 

<T1TLE>{ data($book/TlTLE) }</TlTLE> 

</B00K> 

element BOOK { 

attribute YEAR { data($book/@YEAR) }, 
element TITLE { data($book/TlTLE) } 

} 



The logical notation provides a way to construct an attribute in isolation, 
which is not possible in the physical notation. 

for $book in document ("books. xml") /BOOKS/BOOK 
return 
<B00K> 

if empty ($book/@YEAR) then 
attribute YEAR 2000 
else 

$book/@YEAR , 

$book/title 

</B00K> 

The logical notation also provides a way to compute the name of an element 
or attribute, which will be demonstrated in Section 14. 
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9 Grouping 

A common operation for databases is grouping. In the relational world, this 
often requires special support, such as the “group by” clause in SQL. The nested 
structure of XQuery supports grouping naturally. 

Here is a query that lists for each author the titles of all books by that author. 



let $books := document ("books. xml") /BOOKS 
for $author in distinct ($books/B00K/AUTH0R) return 
<AUTH0R NAME="{ $author }">{ 

$books/B00K [AUTHOR = $author] /TITLE 
></AUTH0R> 

<AUTH0R NAME="Abiteboul"> 

<TlTLE>Data on the Web</T1TLE> 

</AUTH0R> , 

< AUTHOR NAME="Buneman"> 

<TlTLE>Data on the Web</T1TLE> 

<T1TLE>XML in Scotland</TlTLE> 

</AUTH0R> , 

< AUTHOR NAME="Suciu"> 

<TlTLE>Data on the Web</T1TLE> 

</AUTH0R> 



Grouping provides another example where the inferred type may be too 
broad. 



let $books := document ("books. xml") /BOOKS 

for $author in distinct ($books/B00K/AUTH0R) return 
<AUTH0R NAME="{ Sauthor }">{ 

$books/B00K [AUTHOR = $author] /TITLE 
></AUTH0R> 

G 

element AUTHOR { 

attribute NAME { string }, 
element TITLE { string } * 

} 

2 

element AUTHOR { 

attribute NAME ■[ string }, 
element TITLE -[ string } + 

} 

As before, this may be fixed using a “treat as” expression, or using the conve- 
nience function oneOrMoreO. 
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10 Join 

Another common operation for databases is to join data from two relations. 
Indeed, efficient expression and optimization of joins is central to the power and 
popularity of databases. 

Here is a revised type declaration for books. 

define element BOOKS { 
element BOOK * 

} 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal , 
element ISBN of type xs: string 

} 

Assume that Amazon and Barnes and Noble make available data in this format. 
Here is a query that lists all books that are more expensive at Amazon than at 
Barnes and Noble. 

let $am := document("http://www. amazon. com/books.xml")/B00KS, 

$bn := document ("http://www.bn.com/books. xml") /BOOKS 
for $a in $am/B00K, 

$b in $bn/B00K 
where $a/lSBN = $b/lSBN 
and $a/PRlCE > $b/PRlCE 

return <B00K>{ $a/TlTLE, $a/PRlCE, $b/PRlCE }</B00K> 

(Because it will be easy to formulate such queries, it may be a while before 
vendors make data available in such formats.) 

If a similar query was formulated for a relational database, it might be im- 
plemented by sorting the Amazon books and the Barnes and Noble books in 
order of ISBN, then merging the resulting lists and checking the prices. It is 
difficult to apply this optimization to the query above because order is signifi- 
cant in XQuery. The way in which the query is written specifies that the books 
should be presented in the same order that they appear in the Amazon database. 
Reversing the two “for” clauses would specify that they should be in the same 
order as in the Barnes and Noble database. 

In fact, the user may not care about the order in which the results are com- 
puted, and may wish to give the XQuery implementation flexibility to choose 
an order that can be computed efficiently. This may be specified by using the 
unordered expression. 

unordered ( 

for $a in $am/B00K, 

$b in $bn/B00K 
where $a/ISBN = $b/ISBN 
and $a/PRICE > $b/PRICE 
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return <B00K>{ $a/TITLE, $a/PRICE, $b/PRICE }</B00K> 

) 

In general, the expression unordered (i?a;pr) may return any permutation of the 
sequence returned by Expr. 

Often, the user wants the result to be sorted in a particular order. In the 
query above, one may want the answer to be sorted with the titles in alphabetic 
order. 

for $a in $am/B00K, 

$b in $bn/B00K 
where $a/ISBN = $a/ISBN 
and $b/PRICE > $b/PRICE 
order by $a/TITLE 

return <BQ0K>{ $a/TITLE, $a/PRICE, $b/PRICE ></B00K> 

Whenever a sequence is sorted, the order of the original sequence is irrelevant 
(unless the sort is required to be stable) . Opportunities for optimization can be 
expressed by introducing unordered expressions, and pushing such expressions 
into the computation. 

for $a in $am/BDDK, 

$b in $bn/BD0K 
where $a/ISBN = $a/ISBN 
and $b/PRICE > $b/PRICE 
order by $a/TITLE 

return <B00K>{ $a/TITLE, $a/PRICE, $b/PRICE }</B00K> 

for $x in 
unordered ( 

for $a in $am/BQ0K, 

$b in $bn/B00K 
where $a/ISBN = $a/ISBN 
and $b/PRICE > $b/PRICE 

return <B00K>{ $a/TITLE, $a/PRICE, $b/PRICE }</B00K> 

) 

order by $x/TITLE 
return $x 

for $x in 
unordered ( 

for $a in unorderedC $am/B00K ) , 

$b in unorderedC $bn/B00K ) 
where $a/ISBN = $a/ISBN 
and $b/PRICE > $b/PRICE 

return <B00K>{ $a/TITLE, $a/PRICE, $b/PRICE }</B00K> 



) 
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order by $x/TITLE 
return $x 

For some queries that compute a join over a database it is desirable to include 
some data that does not appear in both relations. In SQL this is called a “left 
outer join”, and SQL includes special statements to support computing such 
joins. In XQuery, this may be specified using operations that we have already 
discussed. 

Here is a query that lists all books available from Amazon and from Barnes 
and Noble, followed by all books available from Amazon only. 

for $a in $am/B00K, 

$b in $bn/BDDK 
where $a/ISBN = $b/ISBN 

return <BQ0K>{ $a/TITLE, $a/PRICE, $b/PRICE ></BQ0K> 

> 

for $a in $am/BDDK 

where not($a/ISBN = $bn/BDDK/ISBN) 

return <B00K>{ $a/TITLE, $a/PRICE }</B00K> 

e 

element BOOK { TITLE, PRICE, PRICE } * 

> 

element BOOK { TITLE, PRICE } * 

11 Nulls and three- valued logic 

We don’t always know everything: sometimes data is missing. In recognition of 
this SQL supports a special “null” value. In XML, one may support missing data 
by simply making the associated element or attribute optional. (XML Schema 
also supports a special xsi:nil attribute, but we won’t go into that here.) 

The arithmetic operations of XQuery are designed to make it easy to operate 
on potentially missing data. In XQuery the arithmetic operations expect each 
argument to be either a number or the empty sequence, and if either argument 
is the empty sequence then the result is the empty sequence. The design is 
motivated, in part, by a desire to mimic the behaviour of arithmetic operators 
in SQL when passed null data. 

Here is yet another set of declarations for books. 

define element BOOKS { element BOOK * } 
define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal , 
element SHIPPING of type xs: decimal ? 

} 

Here is some data matching the above. 
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<B00KS> 

<B00K> 

<TITLE>Data on the Web</TITLE> 

<PRICE>40 . 00</PRICE> 

<SHIPPING>10 . 00</PRICE> 

</BDDK> 

<B0DK> 

<TITLE>XML in Scotland</TITLE> 

<PRICE>45 . 00</PRICE> 

</B00K> 

</B00KS> 

Here is a query that lists all books with total cost $50.00. 

for $book in document ("books. xml") /BOOKS/BOOK 
where $book/PRICE + $book/SHIPPING = 50.00 
return $book/TITLE 

<TITLE>Data on the Web</TITLE> 

If the shipping is missing, then the total cost is unknown, and hence cannot be 
equal to $50.00. That is, we have 45 . 00 + () =l>() and 0 = 50.00 ^falseO. 

For convenience, there is a function if Absent (a;,j/) that makes it easy to 
supply a default value. This function returns the value of x, unless x is the empty 
sequence, in which case it returns y. 

Here is a query that lists all books with total cost $50.00, where a missing 
shipping cost is assumed to be $5.00. 

for $book in /BOOKS/BOOK 

where $book/PRICE + ifAbsent($book/SHIPPING, 5.00) = 50.00 
return $book/TITLE 

<TITLE>Data on the Web</TITLE>, 

<TITLE>XML in Scotland</TITLE> 



12 Type errors 

When evaluating a type system, it is instructive to examine not only those 
programs that pass the type checker but also those that fail it. What errors does 
the type checker catch? 

For this section, it will be helpful to consider a series of similar type decla- 
rations. All of the definitions presume an element that contains a sequence of 
books. 



define element BOOKS { element BOOK * } 
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For each example we will define the type of book elements, and also possibly a 
type of answer elements. 

One common kind of error is to select an element or attribute that is not 
present. This can happen through misunderstanding or misspelling. 

Say we define a book to contain a title and an optional price. 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal ? 

} 

Here is a query that lists the title and ISBN number of each book. 

for $book in document ("books. xml" )/BD0KS/B00K return 
<ANSWER>{ $book/TlTLE, $book/lSBN }</ANSWER> 

G 

element ANSWER { 

element TITLE of type xs : string 

} * 

This is not a sensible query, because book is defined to contain a title and a 
price, not an ISBN element. 

The usual reason for reporting a type error is that evaluation of the expression 
may go wrong, for instance, by adding an integer to a string. Here “wrong” is 
used in the technical sense of Milner’s motto: “Well-typed programs do not go 
wrong” . Achieving this requires a careful definition of “wrong” : we do not define 
queries that divide by zero or fail in a “treat as” expression as wrong. 

But the expression above is not wrong in this sense! The semantics of the 
XPath expression $book/ISBN is perfectly well-defined, it returns the empty 
sequence. Similarly, $book/PRICE returns the empty sequence whenever the op- 
tional price is absent. 

Nonetheless, it is possible to issue a warning. The computation may not be 
wrong, but it is wrong-headed. Type inference shows that in this case the type of 
the expression $book/ISBN is () , the type of the empty sequence. Why would one 
ever want to write an expression that always evaluates to the empty sequence? 
There is one such expression that is useful, namely the expression () itself. But 
any other expression that has type 0 is likely to be an error. Note that the 
expression $book/PRICE does not have type (), because the type indicates that 
a price may be present, and so there would be no warning in that case. The 
idea of issuing warnings for expressions with empty type appears to be new with 
XQuery. 

(Why make it a warning rather than an error? Because there are circum- 
stances where it might be reasonable to write such an expression. For instance, 
a single query may be used against different data sources with different schemas. 
For some schemas an expression might have the empty type, while for other 
schemas the same expression might have a non-empty type.) 
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In many circumstances, types will be declared for both the input and the 
output of a query. In this case, the error will be caught even if the mechanism 
of issuing warnings is not in effect. Here are input and output type declarations. 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal 

} 

define element ANSWER { 

element TITLE of type xs : string , 
element ISBN of type xs: string 

} 

Here is the same query as before, modified to explicitly validate its output. 

for $book in document ("books. xml") /BOOKS/BOOK return 
validate { 

<ANSWER>{ $book/TlTLE, $book/lSBN }</ANSWER> 

> 

This will report a static error, because the type declared for an answer element 
requires it to have both a title and an ISBN number, while the type infrerred 
shows that it has only a title. 

Of course, the type system also catches errors when an expression does go 
wrong, for instance, by adding a boolean to a number. Say that the type for 
books is declared as follows. 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal , 
element SHIPPING of type xs: boolean , 
element SHIPCDST of type xs: decimal ? 

> 

Here is a query that lists the total cost of a book by adding the price and the 
shipping. 

for $book in document ("books. xml" )/BD0KS/B00K return 
<ANSWER>{ 

$book/TlTLE, 

<T0TAL>{ $book/PRlCE + $book/SHlPPlNG }</T0TAL> 

></ANSWER> 

Here the author of the query has gotten confused: the SHIPPING element contains 
not the cost of shipping (that is in SHIPCDST), but a boolean indicating whether 
shipping charges apply. In this case, the expression may indeed go wrong, and a 
static type error occurs in the usual way. 

As explained in the previous section, arithmetic operators are specially de- 
signed to accommodate null data. If the query writer has forgotten that some 
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element or attribute is optional, this may again yield a wrong-headed result 
without the query actually going wrong. Such errors can often be detected if a 
declaration is also provided for the output. 

Say that the type of input and output is as follows. 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal , 
element SHIPPING of type xs: decimal ? 

} 

define element ANSWER { 

element TITLE of type xs : string , 
element TOTAL of type xs: decimal 

} 

Here is a query that lists the title and total cost of each book. 

for $book in /BOOKS/BOOK return 
validate { 

<ANSWER>{ 

$book/TlTLE, 

<T0TAL>{ $book/PRlCE + $book/SHlPPlNG }</T0TAL> 

}</ANSWER> 

> 

This time the shipping cost is kept in an element called SHIPPING, but the cost 
is optional. If it is not present, the sum yields an empty sequence. However, 
the error can be detected because a type has been declared for the answer, and 
this type requires that the TOTAL element contains a decimal, and that this is 
required not optional. 

13 Functions 

Functions are straightforward. Say that a book element contains a title, price, 
and shipping cost. 

define element BOOK { 

element TITLE of type xs : string , 
element PRICE of type xs: decimal , 
element SHIPPING of type xs: decimal ? 

} 

Here is a function that returns the cost of ordering a book. 

define function cost ($book as element BOOK) as xs: decimal? { 
$book/PRlCE + $book/SHlPPlNG 

} 
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14 Recursion 

XML data may have a recursive structure. Such structures may be processed 

using recursive functions. 

Here are declarations for a recursive part hierarchy. 

define element PART { 

attribute NAME of type xs: string & 
attribute COST of type xs; decimal , 
element PART * 

> 

Here is some data. The costs are incremental, that is, the cost of assembling the 

subparts to yield the part. 

<PART NAME="system" C0ST="500 . 00"> 

<PART NAME="monitor" C0ST=" 1000 . 00"/> 

<PART NAME="keyboard" CDST="500 . 00"/> 

<PART NAME="pc" C0ST="500 . 00"> 

<PART NAME="processor" C0ST="2000 . 00"/> 

<PART NAME="dvd" C0ST="1000.00"/> 

</PART> 

</PART> 

Here is a function that computes new values for the cost attribute of each part. 

The new value is the total cost. 

define function total ($part as element PART) as element PART { 
let Ssubparts := $part/PART/total ( . ) 
return 

<PART NAME="$part/@NAME" 

C0ST="$part/QC0ST + sum($subparts/@COST) ">{ 

$ subparts 
}</PART> 

> 

Here is the result of applying the function to the given data. 

total (document ( "part . xml " ) /PART) 

<PART NAME="system" C0ST="5000 . 00"> 

<PART NAME="monitor" C0ST=" 1000 . 00"/> 

<PART NAME="keyboard" CDST="500 . 00"/> 

<PART NAME="pc" C0ST="3500 . 00"> 

<PART NAME="processor" C0ST="2000 . 00"/> 

<PART NAME="dvd" C0ST="1000.00"/> 

</PART> 

</PART> 
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15 Wildcards and computed names 

An important aspect of XML is that it is semi-structured: some data may be 
strongly constrained as to its structure, while other data is weakly constrained 
or totally unconstrained. 

Unconstrained data is modelled with the type element. All element types 
are included in this type, and one may apply “treat as” expressions to convert 
from this type to a more specific type. The type element is a little like the 
type object in an object-oriented language, although there is not much that is 
object-oriented about XQuery. 

Here is a function that swaps all attributes and elements within an element. 

define function swap($e as element) as element { 
element { name($e) } { 
for $x in $e/* return 

attribute { name($x) } { data($x) }• , 
for $x in $e/@* return 

element { name($x) } ■[ data($x) } , 

} 

} 

This function uses the XPath wildcards $e/* to select all elements in $e and 
$e/@* to select all attributes in $e, and the wildcard type element which denotes 
the type of all elements. For example, 

swapC <B00K YEAR="2003"><TlTLE>XQuery</TlTLEX/B00K> ) 

<B00K TlTLE="Xquery"><YEAR>2003</YEAR></B00K> 

e 

element 

16 XPath and XQuery 

The translation of XPath path expressions into XQuery for expressions is a bit 
more subtle than previously indicated. Each XPath is guaranteed to return a se- 
quence in which nodes appear in document order without duplicates. (Document 
order is the order in which the start tags appear, that is, a pre-order traversal.) 
In previous sections, we used the following translation. 

$root/B00K/AUTHDR 

for $dot in $root/B00K 
return $dot /AUTHOR 

This works well when each step of the XPath is selecting children. But in general, 
each step in an XPath may select arbitrary descendendants, or even ancestors, of 
the current node. This mean that in general it is necessary to apply the function 
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distinct-doc-order, which sorts the nodes into document order and removes 
duplicates. 

Here is an example of the difference between sorting and not sorting into 
document order. Consider the following document. 

<WARNING> 

<P> 

Do <EM>not</EM> press button, 
computer will <EM>explode ! </EM> 

</P> 

</WARNlNG> 

Here is an XPath that selects all elements within the document, in document 
order. 

document ( "warning . xml " ) //* 

<WARN1NG> 

<P> 

Do <EM>not</EM> press button, 
computer will <EM>explode ! </EM> 

</P> 

</WARNlNG> , 

<P> 

Do <EM>not</EM> press button, 
computer will <EM>explode ! </EM> 

</P>, 

<EM>not</EM> , 

<EM>explode ! </EM> 

Similarly, here is an XPath that selects all the text node descendants of any 
element in the document. 

document ( "warning . xml " ) / /*/text ( ) 

distinct-doc-order ( 

let $root := document ("warning. xml") return 
for $dot in $root//* return 
$dot/text 0 

) 

=> 

"Do ", 

"not" , 

" press button, computer will ", 

"explode ! " 

But note what happens if sorting in document order is omitted: 
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let $root := document ("warning. xml") return 
for $dot in $root//* return 
$dot/text 0 

=> 

"Do ", 

" press button, computer will ", 

"not" , 

"explode ! " 

(This example of a safety-critical application of a sorting function is due to 
Michael Kay.) 

One consequence of the importance of document order is that each node has 
a distinct node identity. Anyone from the functional programming community 
might well expect a tree to be characterized solely by its contents. For instance, 
recall the following query. 

document ( "books . xml " ) /BOOKS/BOOK/AUTHOR 

<AUTHDR>AbitebouK/AUTHOR> , 

<AUTHDR>Buneman</ AUTH0R> , 

<AUTHDR>Suciu</AUTHQR> , 

<AUTHDR>Buneman</AUTHOR> 

Here one might expect the second and fourth nodes to be identical. But in 
XQuery they are not at all identical — the former is an author node from the 
frist book and the latter is an author node from the second book. They are 
distinct, and the former precedes the latter in document order. 

One consequence of this is that sharing is reduced and copying is increased. 
Consider an expression that wraps the above result sequence in a new element. 

<AUTH0RS>{ 

document ( "books . xml " ) /BOOKS/BOOK/AUTHOR 
}</AUTH0RS> 

<AUTHDRS> 

<AUTHDR>AbitebouK/AUTHDR> , 

<AUTHDR>Buneman</ AUTH0R> , 

<AUTHDR>Suciu</AUTHOR> , 

<AUTH0R>Buneman</AUTH0R> 

</AUTH0RS> 

In this case, the author elements are now children of the newly created AUTHORS 
element, not of the old BOOK elements. So in order to reflect node identity cor- 
rectly, all of the elements must be copied, yielding elements with distinct identity. 
In general, this can be quite expensive, and is not at all the sort of thing that 
functional programmers are used to. It’s life, Jim, but not as we know it! 

Despite node identity being something quite outside the functional program- 
ming mainstream, a well-known functional programming trick can be used to 
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minimize the copying required to maintain node identity. Use a bit in each node 
to keep track of whether there is one pointer to it or many: if there is only one 
pointer, then it does not need to be copied when it is installed in another node; 
if there is more than one pointer, then a (deep) copy is required. 

So even though the XML data model is quite different from the usual func- 
tional programming approach, ideas from the functional programming commu- 
nity prove quite valuable. 

17 Conclusions 

Galax, an implementation of XQuery constructed by Jerome Simeon, Mary 
Fernandez and others, is available from http://db.bell-labs.com/galax/. 

Functional programmers pride themselves on the elegance of functional lan- 
guages. But there are relatively few functional programmers around to make 
such claims. 

Anyone who has dealt into XML at all deeply knows that it has many corners 
which no one could describe as elegant. These are in part the result of XML 
evolving from an earlier standard, SGML, and the need to meet the needs of 
many different communities of users. But as a result, XML has many more users 
than functional languages. 

Many opportunites arise as a result of this. XML has lots of room for im- 
provement. XML can benefit by applying ideas from the functional programming 
community. XML has many users, so even a small benefit has a big cumulative 
effect. And XML will be with us for many years to come, so a little effort now 
can have a large impact on the future. 

XML needs you! 
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